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Summary: Let a sequence of iid. random variables £1, . . . , £ n be given on a 
space (X, X) with distribution together with a nice class T of functions 
f(xi, . . . ,Xk) of k variables on the product space (X h ,X k ). For all / G T 
we consider the random integral J n ,k(f) of the function / with respect to 
the /c-fold product of the normalized signed measure y/n(fj in — y), where \i n 
denotes the empirical measure defined by the random variables £1, . . . , £ n and 

o 



o 

(N 



investigate the probabilities P sup \J n k(f)\ > x for all x > 0. We show 



that for nice classes of functions, for instance if T is a Vapnik-Cervonenkis 
class, an almost as good bound can be given for these probabilities as in the 

■ case when only the random integral of one function is considered. 

1. Introduction. Formulation of the main results 

The following problem is studied in this paper: Let a probability measure \i be given on a 
measure space (X, X), take a sequence £i, . . . , £ n of independent, identically distributed 
(X, X) valued random variables with distribution fi, and define the empirical measure 

■ fJ>n, 

m : fJ>n(A) = -#{j: £ A, 1 < j < n}, A ex, 

o n 

^ I of the sample £i, . . . , £ n . Let us take a nice set T of measurable functions f(xi, . . . ,Xk) 
on the /c-fold product space (X k , X k ) and define the integrals J n ,k(f) of the functions 

^ ' f E J 7 with respect to the /c-fold product of the normalized empirical measure \i n by 
the formula 

s 

J n ,k(f) = —n~ / f(uii---,Uk)(^n(du 1 )-y(du 1 ))...((i n (du k )-(x(du k )), 

where the prime in J means that the diagonals Uj = ui, 1 < j < I < k, 
are omitted from the domain of integration. (1-1) 

In this work I try to give a good estimate on the probabilities P sup \J n k(f) \ > x 

for all x > 0. To formulate the main result of the paper first I introduce the following 
definition. 

Definition of L p -dense classes of functions. Let us have a measure space (Y, 3^) 
and a set Q of y measurable functions on this space. We call Q an L p -dense class with 
parameter D and exponent L if for all numbers 1 > e > and probability measures v 
on the space (Y,y) there exists a finite e-dense subset Q e ^ v = {gi, . . . , g m } C Q in the 
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space L p (Y,y,i/) consisting of m < De L elements, i.e. such a set Q Sjl/ C Q for which 

inf f \g — gj\ p dv < e p for all functions g e Q . (Here the set Q e ^ v may depend on the 
.</., c Q • 

measure v , but its cardinality is bounded by a number depending only on e.) 

In this paper we shall work with such classes of functions T which contain only 
functions with absolute value less than or equal to 1. In this case T is an L p -dense 
class of functions for all 1 < p < oo (with an exponent and a parameter depending 
on p) if there is a number 1 < p < oo for which it is L p -dense. We shall formulate our 
statements mainly for L p -dense classes of functions with the parameter p = 2, since this 
seems to be the most convenient choice. Our main result is the following 

Theorem. Let us have a non-atomic measure fx on the space (X, X) together with an 
L 2 -dense class T of functions f = f(x\, . . . , x^) of k variables with some parameter 
D and exponent L on the product space (X k ,X k ) which consists of at most countably 
many functions, and satisfies the conditions 

\\f\\oo= sup |/(xi,...,x fc )| <1, for all feJ^ (1.2) 

XjEX, l<j<k 

and 

\l = Ef 2 (^ u ...,i k ) = J f 2 (x 1 ,...,x k )/j(dx 1 )...ij(dx k ) < a 2 for all f e T 

(1.3) 

with some constant a > 0. Let us also assume that the parameter D of the L 2 -dense 
class T satisfies the condition 

D < n 13 with some [3>0. (1.4) 

Then there exist some constants C = C(k) > 0, a = a(k) > and M = M(k) > 
depending only on the parameter k such that the supremum of the random integrals 
Jn,k(f), f £ J~, defined by formula (1.1) satisfies the inequality 



P^sup|J n , fc (/)|>xj <CDexpj-a 



2/fc 



(1.5) 

if na 2 > >M(L + /3 + 1) 3 / 2 log-, 

\a / a 

where (3 is the number in (1-4), o,nd the number D in formula (1.5) agrees with the 
parameter of the L^-dense class T . 

The condition that T is a countable class of functions can be weakened. To formu- 
late such a result the following definition will be introduced. 

Definition of countable majorizability. A class of functions T is countably ma- 
jorizable in the space (X k , X k , fi k ) if there exists a countable subset T' C T such 
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that for all numbers x > the sets A(x) = {uj: sup | J n k(f)(u)\ > x} and -B(x) = 

{uj: sup | J n k(f)(uj) \ > x} satisfy the identity P(A(x) \ B(x)) = 0. 
feJ" 

Clearly, B(x) C A{x). In the above definition we demanded that for all x > 
the set B(x) is almost as large as A(x). Now the following corollary of the Theorem is 
given. 

Corollary 1 of the Theorem. Let a class of functions T satisfy the conditions of 
the Theorem with the only exception that instead of the condition about the countable 
cardinality of T it is assumed that T is countably majorizable in the space (X k , X k , /j, k ). 
Then T satisfies the Theorem. 

The condition that the class of functions JF is countable was imposed to avoid 
some unpleasant measure theoretical difficulties which would arise if we had to work 
with possibly non-measurable sets. On the other hand, I have the impression that 
Corollary 1 can be applied in all investigations where an estimate about the supremum 
of multiple random integrals with respect to a normalized empirical measure is needed. 
It is not difficult to prove that Corollary 1 follows from the Theorem. To do this we 
have to show that if T is an L 2 -dense class with some parameter D and exponent L, 
and T' C J 7 , then T' is also an L2-dense class with the same exponent L, only with a 
possibly different parameter D' . 

To prove this statement let us choose for all numbers 1 > e > and probability 
measures v on (Y, y) some functions /i, . . . , f m G T with m < D (|) elements, such 

that the sets Vj = \ f: J \f - fj\ 2 dv < (f) \ satisfy the relation [j Vj = Y. For 

j=1 

all sets T>j for which T>j fl T' is non-empty choose a function /j e T>j fl T' . In such a 
way we get a collection of functions /j from the class T' containing at most 2 L De~ L 
elements which satisfies the condition imposed for L 2 -dense classes with exponent L 
and parameter 2 L D for this number e and measure v. 

The following Corollary of the Theorem may be of special interest. It is similar to 
some results of paper [2] or Theorem 5.3.14 in [4]. 

Corollary 2 of the Theorem. Let us consider a non-atomic probability measure fx 
on a measure space (X, X) and an L 2 -dense class T of functions on the k-fold product 
space (X k ,X k ) with some exponent L and parameter D which is either countable or 
countably majorizable. Let us also assume that sup \f(xi, . . . ,Xk)\ < 1 for all 

xjEX, l<j<k 

f G T . Then the supremum of the random stochastic integrals J n ,k(f), f £ satisfies 
the inequality 

P (jap \J n ,k(f)\ >^j< Ce~ axVk (1.6) 

for all x > with some constants a = a(k) > and C = C(/c, D, L) depending on the 
parameter k on the exponent and parameter of the L 2 -dense class T . 
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Proof of Corollary 2. Let us first assume that D < n, and apply the result of the 
Theorem with a = 1. Then conditions (1.3) and (1,4) of the Theorem hold. Also 
the first part of the condition in (1.5) holds if x < ro fc / 2 , and P(\J n ,k{f)\ > x) = if 
x > n k/2 . The second condition of (1.5) is satisfied if x 2/k > M((L + (3 + l) 3/2 log2, 
hence relation (1.6) holds if x > const, with an appropriate constant. If the number 
C in (1.6) is chosen sufficiently large, then the right-hand side of (1.6) is greater than 

1 for x < const. In the case n < D the random integral |J n ,fc(/)| is l ess than 2 ^, - . 
Hence the statement of Corollary 2 holds for all x > with an appropriate choice of the 
parameter C. 

In the Theorem we have considered the supremum of multiple random integrals for 
a nice class of functions of k variables with respect to the /c-fold product of a normalized 
empirical measure. It was shown that if the variances of the random integrals we have 
considered are less than some number a 2 > 0, then under some additional conditions this 
supremum takes a value larger than x with a probability less than P{Car\ > x), where r\ 
is a standard normal random variable, and C = C(k) is a universal constant depending 
only on the multiplicity k of the random integrals. This is the sharpest estimate we 
can expect. Moreover, this estimate seems to be sharp also in that respect that the 
conditions imposed for its validity cannot be considerably weakened. If condition (1.2) 

does not hold or na 2 < (f) 2 ^, then the estimate of the Theorem may not hold any 
longer even if the class of functions T contains only one function. In such cases there 
exist examples for which the probability P(J n ,k(f) > x) is too large. Indeed, in such 
cases it may happen that the value of relatively few members of the sample take the 
random integral larger than x with relatively large probability, and the remaining part 
of the sample does not diminish it. Here I do not work out the details of such examples. 

If (f) 2 ^ < Mlog|: with a not too large number M > 0, then the estimate 
of the Theorem may be violated again, but in this case the reason for it is that the 
supremum of many small random variables may be large. To understand this let us 
consider the following analogous problem. Take a Wiener process W(t), < t < 1, and 
consider the supremum of the expressions W(t) — W(s) = f f Sj t(u)W(du) = J{f s ,t), 
with the functions /«,*(•) on the interval [0,1] defined by the formula f s ,t(u) = 1 if 
s < u < t, f s ,t(u) = if < u < s or t < u < 1. If we consider the class of 
functions T a = {f s ,t- J f^ti u )du = t — s < a 2 }, then it is natural to expect that 

sup J(f s ,t) > x J < e ~ const - ( x / cr ) 2 . However, this relation does not hold if x = 

x(a) < (1 — e)J 2 log —a with some e > 0. In such cases P sup J(f s t ) > x ] — > 1, as 
V \/.,..-: >'■• / 

a — > 0. This can be proved relatively simply with the help of the estimate P(J(f s j) > 

^(c)) > const. a 1_e if \t — s\ = a 2 and the independence of the random integrals J(f s ,t) 

if the functions f 8 j are indexed by such pairs (s, t) for which the intervals (s, t) are 

disjoint. 

Some additional work would show that a similar picture arises if we integrate with 
respect to the normalized empirical measure of a sample with uniform distribution on 
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the interval [0, 1] instead of a Wiener process. This yields an example for an L 2 -dense 
class of functions in the case k = 1 for which the estimate of the Theorem does not 
hold any longer if ) 2//fc < Mlog ^ with some M < \pl. At a heuristic level it is clear 
that such an example can be given also for k > 1, and the number M in condition (1.5) 
has to be chosen larger if we want that the Theorem hold also for an L 2 -dense class 
of functions JF with a large exponent L. (In this paper I did not try to find the best 
possible condition of the Theorem in the right-hand side inequality of (1.5).) 

One would like to see some interesting examples when the Theorem is applicable 
and to have some methods to check the conditions of the Theorem. It is useful to know 
that if T is a Vapnik-Cervonenkis class of functions whose absolute values are bounded 
by 1, then T is an L 2 -dense class. 

To formulate the above statement more explicitly let us recall that a class of subsets 
D of a set S is a Vapnik-Cervonenkis class if there exist some constants B > and 
K > such that for all integers n and sets So(n) = {xi, . . . ,x n } C S of cardinality 
n the collection of sets of the form So(n) n -D, D G T>, contains no more than Bn K 
subsets of So(n). A class of real valued functions T on a space (Y,y) is a Vapnik- 
Cervonenkis class if the graphs of these functions is a Vapnik-Cervonenkis class, i.e. if 
the sets A(f) = {(y,t): y G V, min(0, f(y)) <t< max(0, f(y))}, f G constitute a 
Vapnik-Cervonenkis class of sets on the product space Y x R 1 . 

An important result of Dudley states that a Vapnik-Cervonenkis class of functions 
whose absolute values are bounded by 1 is an Li-dense class. The parameter and 
exponent of this Li-dense class can be bounded by means of the constants B and K 
appearing in the definition of Vapnik-Cervonenkis classes. On the other hand, an Li- 
dense class of functions bounded by 1 is also an L2-dense class (with possibly different 
exponent and parameter), since J \ f— g\ 2 dv < 2 J \ f—g\ dv in this case. Dudley's result, 
whose proof can be found e.g. in Chapter II of Pollard's book [9] (the 25° approximation 
lemma contains this result in a slightly more general form) is useful for us, because 
there are results which enable us to prove that certain classes of functions constitute a 
Vapnik-Cervonenkis class. 

This work is a continuation of my paper [8], where this question was discussed in 
the special case when T contains only one function. Here Theorem 1 (or its equivalent 
version Theorem 1') of [8] will be applied, but no additional argument of that work 
is needed. As I have mentioned in [8], the investigation of this paper was motivated 
by some non-parametric maximum likelihood estimate problems. Earlier I could only 
prove a much weaker version of this result in [7]. 

I found some results similar to that of this paper in the work of Arcones and Gine [2] , 
where the tail-behaviour of the supremum of degenerated [/-statistics was investigated if 
the kernel functions of these [/-statistics constitute a Vapnik-Cervonenkis class. But the 
bounds of that paper do not give a better estimate if we have the additional information 
that the variances of the [/-statistics we consider are small. On the other hand, one 
of the main goal of the present paper was to prove such estimates. (Let me remark 
that formula (1.3) imposes a condition on the variances of the random integrals we 
consider in this paper. See Lemma 3 in [8].) I know of one work where the dependence 
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of the estimate on the variance was investigated in a similar case. This is Alexander's 
paper [1], where the problem of the present paper was studied in the special case k = 1. 
Alexander proved in this case a sharper result. He also studied the case of non-identically 
distributed random variables and gave an upper bound for the distribution function of 
the supremum of random integrals with almost as good constants as in the case of a single 
random integral. Probably a similar result also holds for multiple stochastic integrals, 
but the proof requires a more careful analysis. Alexander's paper was interesting for me 
first of all, because I learned some ideas from it which I strongly needed in the present 
work. On the other hand, I also needed some new arguments, because in the study of 
multiple stochastic integrals some new difficulties had to be overcome. 

This paper consists of six sections and an appendix. In Section 2 the Theorem is 
reduced to a simpler statement formulated in Proposition 3. Section 3 contains some 
important results needed in the proof, and the main ideas of the proof are explained 
there. In particular, the proof of Proposition 3 is reduced to another statement for- 
mulated in Proposition 4. Proposition 4 is proved simultaneously with another result 
described in Proposition 5. To make the proof more transparent first I give it in the 
special case k = 1 in Section 4. Sections 5 and 6 contain the proof of Propositions 4 
and 5 in the general case. In Section 5 it is shown how a symmetrization argument can 
be applied to prove these results, and finally the proof is completed in Section 6. The 
Appendix contains the proof of an estimate about the tail behaviour of the distribution 
of homogeneous polynomials of Rademacher functions. 

2. Reduction of the Theorem to a simpler result 

I shall prove with the help of a natural argument, called the Chaining argument in the 
literature, and the result Theorem V in paper [9] the following result. 

Proposition 1. Let us fix some number A > 2 k , and assume that a class of functions 
T satisfies the conditions of the Theorem with a number M in these conditions which 
may depend also on A. Then a number < a < a < 1 and a collection of functions 
= {fi,---,fm} C T with m < Da~ L elements can be chosen in such a way that the 

m 

sets Vj = {/ : f e J 7 , f\f - fj\ 2 d/i < a 2 }, 1 < j < m, satisfy the relation |J Vj = T , 

i-i 

and 

P(;up \J,Af)\>j) <2CDe, P {-a(^-y" ! } ^ 

if na z> (*) >ML\og- 

with the constants a = a(k), C = C{k) appearing in Theorem V of [8] and the exponent 
L and parameter D of the L2-dense class T if the constant M = M(k,A) is chosen 

sufficiently large. Beside this, also the inequalities 64 (~~) 2 ^ k > na 2 > (^r) 2 ^ o,nd 

n - 2 > M^(L+l3+l)lo g n hol ^ promded that na 2 > > M (L + (3 + if' 2 log J . 

Remark: The introduction of the number A > 2 in Proposition 1 may seem a bit 
artificial. Its role is to guarantee that such a number a could be defined in Proposition 1 
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which satisfies the inequality (| > Ana 2 with a sufficiently large previously fixed 
constant A = A(k). 

Proof of Proposition 1. For all p = 0, 1, 2, . . . choose a set T v = • • • , f P ,m p } C J 7 

with m p < D4 pL a~ L elements in such a way that inf /(/ — ip,j) 2 ^A 4 < I6~ p cr 2 for 

l<j<m p 

all f £ J 7 . For all pairs (j,p), p = 1, 2, ... , 1 < j < m p , choose a precedor (j',p — 1), 
i' = j'{jiP)i 1 < < ffip-i, in such a way that the functions / J)P and fj', p -i satisfy the 

relation / |/ iiP - fy^^da < a 2 lQ~ p . Then we have / ( /f ') d/x < \a 2 l&~ p 



and sup 

x 3 eX, l<j<fc 



fj }P (x 1 ,...,x k )-f jlp _ 1 (x 1 ,...,x k ) 



< 1. Theorem 1' of [8] yields that 

2/fc" 



/ 2~( 1+p) x\ i /2 p_1 x\ 

P(A(j, P )) = p [\JnAhp - fr, P -i)\ > -^—) ^ Cex P \ ~ a {-^a-J 



no 



2' 



2~ 4p / 2 p ~^x\^^ 



if - A > l<j'<m p , p=l,2,..., (2.2) 



and 

P(5(a)) = P (| J n ,k(fo,s)\ >^£j< Ce W |-« (^) 2A } , l<s<m, 



if na 2 > t *\ 

~ \2AoJ 



2/k 



(2.3) 



Choose the integer number R, R > 0, in such a way that 2( 4+2 / /c ^- R+1 ) (^) > 

_n^_ > 2 (4+2/fc)fl^)2/* j define ^2 = 16 -it a 2 &nd j?_ = j: r (Ag n(J 2 > ^jV* 

and A > 2 fc by our conditions, there exists such a non- negative number R.) Then the 

m 

cardinality m of the set JF^. is clearly not greater than Da~ L , and |J Vj = T . Beside 

this, the number R was chosen in such a way that the inequalities (2.2) and (2.3) can 
be applied for 1 < p < R. Hence the definition of the precedor of a pair (j,p) implies 
that 



R rn p m 

P ( sup |J B>fc (/)| > \ ) < P ( |J |J A(j,p) U |J B(s) 



p=l j=l s=l 



^ E E p)) + E ^ E CD 4PL °~ L ex p -« -r^ 

p=lj = l s=l p=l I v a / 

+ C0 .-exp|-a( n -) ). 

If the condition (f ) 2 ^ > M(L + l) 3 / 2 log | holds with a sufficiently large constant M 
(depending on A), then the inequalities 

4 ,^ exp {_ Q (^) 2/ n< 2 -, exp j_ a (l^) 2/ ' 



hold for all p = 1, 2, . . . , and 

a ~ L 6XP {- a {^) 2,k } ^ 6XP {~ a {jA^) 2/k } ■ 
Hence the previous estimate implies that 

p (;up \J n ,M)\ > f ) < g wh-«p {-a. (||-) 2/t } 

and relation (2.1) holds. We have 

„a 2 = 2" 4i W < 2" 4i? • 2 ( 4 + 2 / fc )( i? + 1 )+ 2 " 2 / fc f _x_\ 2/k = 2 6 . 2 2i?/fc ( _^_\ 2/k 

\AaJ \AaJ 

hence na 2 < 2 6 (^r) . Beside this, 

no 2 = 2" 4i W > 2 2 ~ 2 l k • 2" 4i? • 2 4R+2R / k ( JUL) 2 "* > ( JUL) 2 "* . 

\AcrJ \A(jJ 

It remained to show that na 2 > — — ^5^ff/3° S n+1 ^ ■ 

This inequality clearly holds under the conditions of Proposition 1 if a < n -1 / 3 , 
since in this case log f > ^fp, and na 2 > {j^) 2/k > A~ 2 / k M(L + [3 + if/ 2 log f > 
\A- 2 / k M(L + [5 + l)\ogn. If a > n' 1 / 3 , then the inequality 2^ +2 / k ) R {j^) 2/k < 

V, „ ^2/fc^ 4 /(4+2/fc) 

holds. Hence 2~ 4R > 2 ( 2 " 2 / fc ))/( 4 + 2 A) 



na 

22-2/fc 



, and 



na 2 = 2~ 4R na 2 > ^(na 2 ) 1 ^ 



A 4 / 3 



X\ 2 / k ' 
(T 



7 ■ , 4 

with 7 = 



4+f 



2 



Since na 2 > (^) 2 / fc > M(L + /3 + 1) 3/2 , and na 2 > n 1 / 3 , the above estimates yield that 



2/3 



na 2 > A-^ina 2 ) 1 ' 3 [{^) ' \ > A^n 1 ^ (f ) Z/6 (L + (3 + 1) > M ^ + /+ 3 1)logw . 

Now I formulate Proposition 2 and show that the Theorem follows from Proposi- 
tions 1 and 2. 

Proposition 2. Let us have a non-atomic measure fi on the space (X, X) together 
with a sequence of independent and fi distributed random variables £i,..-,£ n o,nd an 
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L 2 -dense class of functions f = f(x±, . . . , Xk) of k variables with some parameter D 
and exponent L on the product space (X ,X ) which consists of at most countably 
many functions and satisfies conditions (1.2), (1.3) and (1-4) with some a > 0, and 
na 2 > K((L + (3)\ogn + 1) with a sufficiently large number K = K(k). Then there 
exists some number 7 = ~f(k) > and threshold index Aq = Ao(k) > depending only 
on the order k of the stochastic integrals we consider such that 

P ( sup |J B>fc (/)| > An k / 2 a k+1 ) < e ~^ h ^ if A > A . 



In the proof of the Theorem we exploit our freedom in the choice of the parameters 
in Propositions 1 and 2. Let us choose a number A such that A > Aq and r yA^ 2k > 
with the numbers Aq, K and 7 in Proposition 2. Proposition 1 will be applied with 

such a number A for which the inequalities (f) 2 ^ > ^4~ n ^ 2 — {^Ao) 2 l k na' 2 hold with 
the above fixed parameter Aq and the number a defined in the proof of Proposition 1. 
(Here and in the sequel we shall assume that the number x satisfies the condition na 2 > 

(^) 2 ^ k > M(L + j3 + l) 3 / 2 log ^ imposed both in Proposition 1 and in the Theorem.) 
Choose such a number M in Proposition 1 (and as a consequence in the Theorem too) for 

which also the inequality na 2 > m2/3( ^ q + /3 4 % 1) logn > K((L + f3) logn + 1) holds with the 
number K appearing in the conditions of Proposition 2. Proposition 1 will be applied 
with the class of functions J 7 , the numbers a and M considered in the Theorem and 
a number A satisfying the above property while Proposition 2 with the above chosen 
number Aq, the number a and the sets of functions T>j defined in Proposition 1. More 
precisely, we apply Proposition 2 for the sets of functions ^—^ JL where g G Vj and fj 
is the 'center' of the set Vj appearing in the definition of the set Vj in Proposition 1. 
Observe that these functions constitute an L 2 -dense class of functions with exponent L 
and parameter D. 

Since (l — ~) x > | > 2Aon k / 2 a k+1 Propositions 1 and 2 with the above parame- 
ters yield that 



P [ sup \J n , k (f)\ >x) <P{ sup |J n , fc (/)| > 4 

Jn,k 



J = l 



sup 



fj ~ 9 



> A n k / 2 a k+1 (2-4) 



< 2CDe,p {-« (^) ''"J + Da~ L e-^ /2k ^. 



Let us understand how the second term at the right-hand side of (2.4) can be estimated. 
The condition na 2 > K((L + 0) logn + 1) implies that a > n -1 / 2 , and by our choice 
of A we have 7 lJ /2fc na 2 > ±na 2 > Llogn > 2LlogJ, i.e. o~ L < e 7^ /2fe ^ 2 /2. 
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As we have seen in Proposition 1 na 2 > (^r) 2 ^- The above relations imply that 

a-L e -iA^ 

gives that 



.Act, 

--L^Al^n-a* < ^A^^/2 < exp j _ ^i/2fc A - 2/k (£) 2 / fc j. Then relation (2.4) 



P ^sup \J n , k (f)\ > x^j < 2CDexp j 



a /x\ 2 / k 



(4A)V k W 

+ Bexp (_l, r ^ ( £)-} 



The last formula means that under the conditions of the Theorem formula (1.5) 
holds (with some new appropriately defined constant a > 0), and this is what we had 
to prove. 

It remained to prove Proposition 2. Its proof requires some new ideas, and the 
remaining part of the paper deals with this problem. There is a counterpart of this 
result about so-called degenerate [/-statistics. The study of degenerate [/"-statistics is 
technically simpler. Hence I formulate this result about [/"-statistics in Proposition 3 
and show that it implies Proposition 2. 

First I recall some notions we need to formulate Proposition 3. Let us have a 
sequence of independent and identically distributed random variables £i,£2, ••• with 
distribution fiona measurable space (X, X) together with a function / = f(x±, . . . , Xk) 
on the k-th. power (X k , X k ) of the space (X,X). We define with their help the U- 
statistic I n ,k(f) of order k, as 

J n,k(f)=^ E (2-5) 

l<js<n, s=l,...,k 

(The function / in this formula will be called the kernel function of the [/-statistic.) 

A real valued function / = f(x±, . . . ,Xk) on the k-th power (X k , X k ) of a space 
(X, X) is called a canonical kernel function (with respect to the probability measure fx 
on the space (X, X)) if 

//(*„. ..,*,_,,«,*,„,...,*,)„(*.) = .> fadll<i<* and *.e*,./,. 

Let me also introduce the notion of canonical functions in a more general case, because 
this notion appears later in Proposition 5. We call a function f(x±, . . . , Xk) on the k- 
fold product (Xi x • • • x Xk, X\ x • • • x Xk, ji\ x • • • x n k ) of k not necessarily identical 
probability spaces (Xj, Xj, fij), 1 < j < k, if 



/ 



f(xi,...,Xj-i,u,Xj+i,...,Xk)lJ>j(du) = for all 1 < j < k and x s G X s , s^j. 
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A [/-statistic with a canonical kernel function is called degenerate. Now I formulate 
Proposition 3. 

Proposition 3. Let us have a probability measure p on a space (X, X) together with a 
sequence of independent and p, distributed random variables £1, . . . , £ n and an Li-dense 
class T of canonical kernel functions f = f(x\, . . . , Xk) (with respect to the measure p) 
with some parameter D and exponent L on the product space (X k ,X k ) which consists 
of at most countably many functions, and satisfies conditions (1.2), (1.3) and (1-4) 
with some a > 0. Let na 2 > K((L + /3)logn + 1) with a sufficiently large constant 
K = K(k). Then there exist some numbers C = C(k) > 0, 7 = 'j(k) > and threshold 
index Aq = Ao(k) > depending only on the order k of the U -statistics we consider such 
that the degenerate U -statistics I nt k(f), f G T ' , defined in (2.5) satisfy the inequality 

P ( sup \n- k ' 2 I n , k (f)\ > An k / 2 a k+1 ) < C e^ Al ^^ if A> A . 



(The constants in Propositions 2 and 3 may be different.) Before deducing Propo- 
sition 2 from Proposition 3 I formulate a simple lemma which will be useful also in the 
subsequent part of the paper. To formulate it let us introduce the following notations. 

Let some measure spaces (^2,^2) and (Z,Z) be given together with a 

probability measure p on the space (Z, Z). Consider a function f{yi,z,y 2 ) on the 
product space (Yi xZxF 2 Ji x Z x 3^2), Vi £ ^i, z G Z, y 2 G y 2 , and define their 
projection 

P»f(yi,y2) = J f{yi,z,y 2 )n{dz), yi e Y u y 2 G Y 2 , (2.6) 
P fi f(yuz,y 2 ) = PJ(y 1 ,y 2 ), y 1 EY 1 , z e Z, y 2 eY 2l (2.6') 

and 

Qtifivu z, V2) = {I- P»)f(yi, z, y 2 ) ^ 
= f(yi,z,y 2 ) - Ppf(yi,z,y 2 ), yi G Yi, z e Z, y 2 e Y 2 . 

(The difference between the operators P^ and _P M is that in the definition of the function 
Pftf we introduced a Active argument z, i.e. P^f is defined on the space Y~i x Y 2 and 
Pfj, on the space Y\ x Z x Y 2 .) 

Lemma 1. Let us have some measure spaces (Y\,yi), (Y 2 ,y 2 ) and (Z,Z), a proba- 
bility measure p on the space (Z, Z) and a probability measure p on the product space 
(Y"i x Y 2 ,y\ x y 2 ). The transformations P^, P^ and defined in (2.6) — (2.6") are 
contractions from the space L 2 (Y\ x Z x Y 2 ,p x p) to the spaces L 2 (Y\ x Y 2 ,p) and 
L 2 (Yi x Z x Y 2 , p x p) respectively, i.e. 

Il^/llw = / P f ,f(yuz,y 2 ) 2 p(dy 1 ,dy 2 ) 

= \\PML P x» = f PJ(yi,z,y 2 ) 2 p(d yi , dy 2 )p(dz) (2.7) 
< ll/lll 2 ,px M = J f{vu z , Wifpi, dyi, dy 2 )p( dz), 
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and 

\\Q»f\\l 2 , P = J QnfiVi, z > Vifp{ dyi, dy 2 ) 

= J (f(yi,z,y2) - P^f(yi,z,y 2 )) 2 p(dyi, dy 2 )p(dz) (2.7') 

< ll/lli 2 ,px M = J f(yi,z,y2) 2 p(dyi, dy 2 )fi(dz). 

If T is an L 2 -dense class of functions f(yi,z,y 2 ) on the product space (Y\ x Z x 
^2,3^i x Z x Y 2 ), yi G y±, z E Z, y 2 <E y 2 with parameter D and exponent L, then 
also the classes = {P^f, '■ f G J 7 } and = {P^f, '■ f G J 7 } with the functions 
P^f and Pfj,f defined in formulas (2.6) and (2.6') are L 2 - dense classes with parameter 
D and exponent L in the spaces (Yi x Y 2 ,y>i x y 2 ) and (Yi x Z x Y 2 ,yi x Z x y 2 ) 
respectively, and the space Q fl = {./ — P^f, f £ defined in (2.6" ) is an L 2 -dense 
class with parameter 2 L D and exponent L in the space (Yi x Z x Y 2 ,3^i x Z x y 2 ). 
Moreover, the class of functions Q'^ = {\{f — P^f), f £ is an L 2 -dense class with 
exponent L and parameter D . 

Proof of Lemma 1. The Schwarz inequality yields that P M (/) 2 < f f(yi,z,y 2 ) 2 p(dz), 
and the inequality f[f(yi,z,y 2 )-P ll f(y u z,y 2 )] 2 p(dz) < f f(y u z,y 2 ) 2 p(dz) also holds. 
Integrating these inequalities with respect to the probability measure p(dyi, dy 2 ) we 
get formulas (2.7) and (2.7'). 

Let us consider an arbitrary probability measure p on the space (Yi x Y 2 , x y 2 ). 
To prove that JF M is an L 2 -dense class we have to find m < De L functions fj £ JF M , 

1 < j < m, such that inf f (fj — f) 2 dp < e 2 for all / £ JF M . But a similar property 

l<j<m 

holds in the space Yi x Z x Y 2 with the probability measure p x p. This property 
together with the L 2 contraction property of P^ formulated in (2.7) imply that is 
an L 2 -dense class. The analogous property for follows from the already proved L 2 - 
density property of and the fact that by replacing a measure p on Yi x Z x Y 2 by 
the measure px/i, where p is the projection of the measure p to the space Y\ x Y, i.e. 
p{B) = p(B x Z) for B G [Vi x y 2 we do not change the L 2 norm of a difference P^f—P^g, 
f,g G T. Moreover, it equals to the L 2 norm of the difference P^f — P^g with respect 
to the measure p. Finally, the desired L 2 -density property of the set can be deduced 
from the following observation. For any probability measure p on the space Yi x Z x Y 2 
and pair of functions / and g such that f(f — g) 2 ^(dp+ dp x du) < where p is 
the projection of the measure p to the space Yi x Y 2 , /((/ — P^f) — (g — P^g)) 2 dp < 

2 f(f-g) 2 dp+2f(P fI f-P fI g) 2 dp<2f(f-g) 2 dp+2 f(f-g) 2 dpxdp<e 2 . This means 
that if {/i, . . . , f m } is an |-dense subset of T in the space L 2 (Y\ x Z x Y 2 , x Z x^ 2 , p) 
with p = i(p + p x yu.) , then {Q^fi, . . . , Q^fm} is an e-dense subset of C/^ in the space 
L 2 (Yi xZxF 2 JixZx 3^2, p)- Moreover, if {/i, . . . , / m } is an £-dense subset with 
respect to the measure |(p + px/j), then {^Q^fi, • • • , \Qnfm} is an e-dense subset of 
Q'p in the space L 2 (Y 1 x^xl^JixZx y 2 , p). 

To deduce Proposition 2 from Proposition 3 let us first introduce the (random) 
probability measures 1 < j < n, concentrated in the sample points £j, i.e. let 
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nV\A) = 1 if i 3 E A, and ^\A) = if (£ A, A e A. Then we can write 
/ » \ 

\x n — \x = - I (a*^ — /i) I , and formula (1.1) can be rewritten as 

n W / 

^ k n „/ 

(^(dm) -//(dm)) ... ^'^(dui) -//(dui)) . 

To rearrange the above sum in a way more appropriate for us let us introduce the 
following notations: Let V = Vk denote the set of partitions of the set {1, 2, . . . , k}, and 
given a sequence . . . ,jk), 1 < Js < ^, 1 < s < k, of length let H(ji, . . . ,jk) denote 
that partition of Vk in which two points s and t, 1 < s, t < k, belong the same element 
of the partition if j s = j t . Given a set A, let \A\ denote its cardinality. 
Let us rewrite the above expression for J n ,k(f) in the form 

^(/) = iE E / '/(«!,...,«*) (2-8) 
• PeP (ji,...,j fe ), J 

l<ji<n, l<l<k 
H(j 1 ,...,j k )=P 

(ji^idm) -At(dui)) ••• (//°' fc) (dui) -At(dui)) • 

Let us remember that the diagonals u s = ut, s ^ t, were omitted from the domain 
of integration in the formula defining J n ,k{f)- This implies that in the case j s = jt 
the measure iv-^ 3 >{du s )ji^ t \dut) has zero measure in the domain of integration. We 
have to understand the cancellation effects caused by this relation. I want to show that 
because of these cancellations the expression in formula (2.8) can be rewritten as a linear 
combination of degenerated [/-statistics with not too large coefficients. The [/"-statistics 
taking part in this linear combination can be bounded by means of Proposition 3, and 
this yields an estimate sufficient for our purposes. This seems to be a natural approach, 
but the detailed proof demands some rather unpleasant calculations. 

Let us fix some P G V and investigate the inner sum at the right-hand side of (2.8) 
corresponding to this partition P. For the sake of simplicity let us first consider 
such a sum that corresponds to a partition P e V which contains a set of the form 
{1, . . . , s} with some s > 2. The products of measures corresponding to the terms in 
the sum determined by such a partition contain a part of length s which has the form 
(^'(dui) — /j(dui)) . . . {ji(fi{du 8 ) — ii{dus)) with some 1 < j < n. This part of the 
product can be rewritten in the domain of integration as 

s 

y^(-l) s ~V(cfati) • • . n(dui-i)(^ (j \dui) - fi(dui))n(dui + i) . . . n(du s ) 
i=i 

+ (-l) a - 1 (s-l) f JL(du 1 )... f JL(du a ). 
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Here we exploit that all other terms of this product disappears in the domain of integra- 
tion. Let us also observe that the term (— l) s_1 (s — l)n(dui) . . . ^t(dui) appears n-times 
as we sum up for 1 < j < n. Similar calculation can be made for all partitions P G V 
and all sets contained in the partitions, only the notation of the indices will be more 
complicated. By carrying out such a calculation the quantity J n ,k(f) can be rewritten 
as the linear combination of integrals of the function f(u\, . . . , Uk) with respect to some 
product measures. The components of these products of measures have either the form 
(/i^ s ^( du s ) — fj,( du s )) or the form fj,( du s ), and all indices j s in a product are different. 
Let us observe that to integrate a function / with respect to [/j,^ s \du s ) — /j,(du s )) is 
the same as to apply the operator Q M;S = I — P^ s for it and then to put £ s = u s in 
the s-th argument of the function Q M)S /, and to integrate a function / with respect to 
fi{du s ) is the same as to apply the operator P M)S for it. Here Q M , S and P^ jS are the 
operators and P M defined in formulas (2.6") and (2.6) if we choose in these formulas 
Y\ as the product of the first s — 1 components, Z as the s-th component and Y<i as the 
product of the last k — s components of the /c-fold product X k . 

Let us work out the details of the above indicated calculations and for all sets 
V C {l,...,k} let us gather in an internal sum depending on V those integrals for 
which the product of the measures contain a component of the form n^ a \ du s )—n( <i-u s )), 
1 < js < n i if s £ y an d a term //( du s ) if s ^ V. In such a way we get the identity 

Jn,k(f)= E C(n,k,\V\)n-\ v \/ 2 ± £ MZj.,seV) (2.9) 

VC{l,2,...,k} ' l<j s <n, 

for sEV 



with the functions 



fv(u s , seV)= ]"I Q,, s I] P,, t U2, • • • , Uk) for all V C {1, ... , k} 

sev te{i,...,k}\v 

and some coefficients C(n, /c, \ V\) which satisfy the inequality \C(n, k, \ V\)\ < G(k) with 
some constant G(k) > 0. The explicit formula for C(n, k, \ V\) is rather complicated, but 
the above estimate about the magnitude of this coefficient is sufficient for our purposes. 
This estimate of C(k, n, \ V\) is sharp, because those partitions P G V which contain the 
\V | one-point subsets of a set V and (k — \ V\)/2 subsets of cardinality 2 of {1, . . . , k} \ V 
yield a contribution of order n~ k ^ 2 n k / 2 ~^ v ^ 2 to the coefficient C(n, k, \V\)n~^ v ^ 2 . 

Let us observe that the inner sum corresponding to a set V at the right-hand 
side of (2.9) is a t/-statistic with the kernel function f v defined in (2.10). Hence to 
carry out our program we have to understand the properties of this function fy. It 
follows from Lemma 1 that under the conditions of Proposition 1 the set of func- 
tions fy, f G J 7 , is an L 2 -dense class with exponent L and parameter 2 kL D, and 

f f v (u s , s E V) \ \ n( du s ) < a 2 for all V G {1, . . . , k}. Let me remark that this es- 

sev 

timate states in particular that the constant term defined in (2.10) with the choice 
V = satisfies the inequality \f$\< a. This estimate follows directly from the Schwarz 
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inequality, because 



fl = {^J f(u 1 ,...,u k )iM(du 1 ).../M(duk)j < J f{u u . . . ,u h )ii(dux) . . . n(du k ) < a 2 . 

Another important observation is that the functions fy are canonical kernel func- 
tions with respect to the measure \i. To prove this statement let us observe that the 
canonical property of a kernel function fy can be reformulated as P^^fv{ u s^ s G V) = 
for all s G V and sets of parameters Ut G X , t G This relation follows from 

the observation that the operators P M)U , 1 < u < k are exchangeable, and P 2 s = _P M)S 
which implies that P^ yS Q^,s = P/j,,s(I — P/j,,s) = 0. (Actually, here we adapted the proof 
of the Hoeffding decomposition of [/-statistics to our case.) 

Formula (2.9) yields that 

JnAf) = E C ^ k ' \V\)n-\ y \' 2 I nm (MZj.,8 G V)), 
VC{l,2,...,n} 



and 



P[sup\J n , k (f)\>An k / 2 a k+1 

< E P f su P \n- lVl/2 \In,\v\(fv)\ > in k ' 2 a k ^ 

{l,2,...,n} V /6 ^ 



Vc 



with some appropriate constant T = T(k). Observe that under the Conditions of Propo- 
sition 2 no 2 > 1, hence n k / 2 (J k + 1 > rv v ^ 2 a^ +1 . This means that if the parameters 
Aq and K are sufficiently large in the conditions of Propositions 2, then this conditions 
allow the application of Proposition 3 to bound the probability P(n~^ v ^ 2 \I n ^y\ (fy) > 
A n k/2 a k+i^ < p(n-\ v \/ 2 \I n} \ V \(f v ) > | n l v 'l/Vl 1/ l+ 1 ) for all functions fy. Thus we get 
that the inequality 

P ( sup |J nfc (/)| > An k / 2 a k+1 ) < C2 K ^(a/t^w < e -«A/*rf'* h ^ 

holds for A > Aq with some T = T(k) if first the constant K and then the constant 
Aq are chosen sufficiently large in the conditions of Proposition 2. This means that 
Proposition 3 implies Proposition 2. 
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3. Some basic tools of the proof 

First I formulate three results we apply in the proof of Proposition 3. The first of 
them helps us to carry out some symmetrization arguments, the second one yields a 
good estimate for the distribution of a homogeneous polynomial of independent random 
variables which take values ±1 with probability \. Finally, the third result enables us 
to reduce Proposition 3 to a simpler statement. 

The first result, formulated in Lemma 2 is a slight generalization of a simple lemma 
which can be found for instance in Pollard's book [9] (8° Symmetrization Lemma). I 
made this generalization, because it is more appropriate for our purposes. 

Lemma 2. (Symmetrization Lemma) Let Z(n) and Z(n), n = 1,2,..., be two 
sequences of random variables on a probability space (Q,A, P). Let a a -algebra B C A 
be given on the probability space (O, A, P) together with a B measurable set B and two 
numbers a > and (3 > such that the random variables Z n , n = 1,2,... are B 
measurable, and the inequality 

P(\Z n \ < a\B)(u) > p for all n= 1,2,... if u e B (3.1) 

holds. Then 

P[ sup \Z n \>a + x) < \p[ sup \Z n - Z n \ > x ) + (1 - P(B)) for all x > 0. 

\l<n<oo / P \l<n<oo / 

(3.2) 

In particular, if the sequences Z n , n = 1,2,..., and Z n , n = 1,2,..., are two inde- 
pendent sequences of random variables, and P(\Z n \ < a) > (3 for all n = 1,2,..., 
then 

P[ sup \Z n \ > a + x) < \p( sup \Z n -Z n \>x). (3.2') 

\l<n<oo / P \l<n<oo / 

Proof of Lemma 2. Put r = min{n: \Z n \ > a + x) if there exists such an n, and r = 
otherwise. Then 



P({t = n} n B) < \ [ P(\Z n \ < a\B) dP = \p{{t = n} n {\Z n \ < a} n 

P J{T=n}nB P 

< ^P({t = n}n {\Z n - Z n \ > x}) for all n = 1, 2, . . . . 

P 

Hence 

P\ sup \Z n \ > a + x) - (1 - P(B)) < P ( \ sup \Z n \ > a + x\ HB 

oo ^ oo 

= p ({r = n}nB)<-J2 P({r = n} n {\Z n - Z n \ > x}) 

n=l ' n=l 



B) 



< —P [ sup \Z n — Z n \ > x 

K l<n<oo 
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Thus formula 3.2 is proved. If Z n and Z n are two independent sequences, and P(\Z n \ < 
a) > (3 for all n = 1, 2, . . . , and we define B as the a-algebra generated by the random 
variables Z n , n = 1,2, . . . , then the condition (3.1) is satisfied also with B — Q. Hence 
relation (3.2') holds in this case. Lemma 2 is proved. 

The second result we need is a multi-dimensional version of Hoeffding's inequality 
formulated in Proposition A: 

Proposition A. Let S\,...,e n be independent random variables, P{ej = 1) = P(€j = 
—l) = -,l<j<n. Fix a positive integer k and define the random variable 

Z= Yl a Uu---Jk)e jl ---e jk (3.3) 

(jl,-jt): 1<JI<™ f°r all l<l<k 
j&j,, if I/J' 

with the help of some real numbers a(ji, . . . ,jk) which are given for all sets of indices 
such that 1 < ji < n, 1 < I < k, and ji ^ ji> if I ^ I' . Put 

S 2 = J2 a\ 3 i,...,3k) (3.4) 

O'lv.Jfc): 1<J(<" for all l<l<k 

Then 

P(\Z\ > x) < Cexp|-S (|) 2/fc | forallx>0 (3.5) 

with some constants B > and C > depending only on the parameter k. Relation 
(3.5) holds for instance with the choice B = 2e ^.y/ k an( ^ C = e k . 

Proposition A is a relatively simple consequence of a famous and important re- 
sult of the probability theory, the so-called hypercontractive inequality for Rademacher 
functions (see e.g. [3] or [6]). The hypercontractive inequality yields some moment 
inequalities that imply Proposition A. Nevertheless, I did not find this result in the lit- 
erature. Therefore I explain in the Appendix how it follows from the hypercontractive 
inequality. 

Remark: The parameter B given in Proposition A is not sharp. This is because the 
moment estimates I could prove are not sharp enough. They are sufficient to give the 
right order of the term in the exponent at the right-hand side of (3.5) but do not give 
the best possible constant B in this estimate. 

Finally I formulate a decoupling type result which enables us to reduce Propo- 
sition 3 to a similar but simpler statement. This result compares the distribution of 
[/-statistics with the distribution of such systems whose coordinates are chosen inde- 
pendently from each other. To make a clear distinction between this object and usual 
[/"-statistics I shall call it independent [/"-statistics. It is defined in the following way: 
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Definition of independent [/-statistics. Let us have k independent copies £i )S , 
. . . , £ n ,s> 1 < s < A;, of a sequence of independent and identically distributed random 
variables £i,..-,£ n with distribution [i on a measurable space (X, X) together with a 
function f = f{x\, . . . ,Xk) on the k-th power (X k , X ) of the space (X, X). We define 
with their help the independent U -statistic I n ,k(f) by the formula 

In,k(f) = y_ /(&i,i>- ( 3 - 6 ) 

l<j s <n, s=l,...,k 



The following Proposition B holds. 

Proposition B. Let us consider a countable sequence fi(xi, . . . , x^), I = 1,2,..., of 
functions on the k-fold product (X k , X k ) of some space (X, X) together with some prob- 
ability measure fx on the space (X, X). Given a sequence of independent and identically 
distributed random variables ^1,^2, ••• with distribution \i on (X, X) together with k 
independent copies £i )S , £2,5, • • • , 1 < s < k, of it we can define the U -statistics I n ,k(fi) 
and independent U -statistics I n ,k(fl) f or all I = 1,2, .. . and n = 1,2, ... . They satisfy 
the inequality 



sup 1 7, 

KKoo 



,k(fi)\>x) <AP 



sup 

Kkoo 



\ln,k(fl)\ > l x 



(3.7) 



for all x > with some constants A = A(k) > and 7 = ^(k) > depending only on 
the order k of the U -statistics. 

I shall deduce Proposition B from the result of paper [5] of de la Pena and Mont- 
gomery-Smith. At first sight one would think that this result is not sufficient for 
our purposes, since it compares the distribution function of a single [/-statistic with 
its independent [/-statistic counterpart, i.e. the supremum with respect to a class of 
functions is missing there. But this result is proved for general Banach space valued 
random variables. Therefore, as I show below, its application for an appropriate L^ 
space yields the desired result. 

The proof of Proposition B (with the help of paper [5].) Let us apply the first part of The- 
orem 1 of [5] in the Banach space £oo consisting of the infinite sequences x = (x\, xi, ■ ■ ■ ) 

of real numbers with norm 1 1 a; 1 1 = sup \xi\ for the kernel functions fj 1 ,...,j k (xi, Xk) = 

i<;<oo 

f{x\, . . . , Xk), f = /2, • • • ), mapping the space (X k , X k ) into the space i^. (Here 
we do not exploit that in the result of [5] the kernel functions may depend on the indices 
(ji, . . . ,jk)-) Then the result in [5] states that 



E 



/ 1 ■ • • 1 Cifc ) 



l<j s <n, s=l,...,k 



> X 
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< AP 



l<j' s <n, s=l,...,fc 



> 7X 



/ 



with some universal constants A 
equivalent to relation (3.7). 



A(k) > and 7 = ^{k) > 0, and this statement is 



Remark: Actually it would be enough to prove Proposition B only for the supremum 
of finitely many [/-statistics with kernel functions /1, . . . , /at and then letting N — > 00. 
In such a way we can avoid the work with infinite dimensional Banach spaces. Such 
an approach makes the proof simpler, in particular because some measure-theoretical 
problems arise if we are working with Banach spaces, where not all continuous linear 
functionals are measurable. Such a difficulty really occurs if we are working with L^, (X) 
spaces with a set X of large cardinality. If we want to apply the result of [5] in the 
space £oo, then we have to check that it is applicable in this case. 

Now I formulate the following Proposition 3'. 

Proposition 3'. Let us have a probability measure n on a space (X,X) together with 
k independent copies £i jS , . . . , £ njS , 1 < s < k, of a sequence of independent and \x 
distributed random variables £1, . . . , £ n and a countable L<i-dense class T of canonical 
kernel functions f = f(x\, . . . ,Xk) (with respect to the measure \x) with some parameter 
D and exponent L on the product space (X k , X k ) which satisfies conditions (1.2), (1.3) 
and (1-4) with some a > 0. Let no 2 > K((L + (3) logn + 1) with a sufficiently large 
constant K = K(k). Then there exists some threshold index Aq = Ao(k) > such that 
the independent U -statistics I n ,k{f), f £ J~ ' , defined in (3.6) satisfy the inequality 



(jup \n~ k m n ^f)\ > An k / 2 a k +^ < e ~^ h ^ if 



A > A Q . 



Proposition 3' and Proposition B imply Proposition 3. The proof of Proposition 3' 
applies some ideas of a paper of Alexander [1] . Let me briefly explain them. 

Let us restrict our attention to the case k = 1. In this case a probability of the form 



n 



-1/2 



sup 



e m 



> x ] has to be estimated. By taking an independent copy of 



the sequence £ n (which disappears at the end of the of the calculation) a symmetrization 
argument can be applied which reduces the problem to the estimation of the probability 



n 



-1/2 



3 / 



> x , where the random variables P(ej = 1) = P(ej 



sup 

— 1) = |, j = 1, . . . , n, are independent, and they are independent also of the random 
variables Beside this, the number x is only slightly smaller than the number x/2. Let 
us bound the conditional probability of the event we have just introduced if the values 
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random variables £j are prescribed in it. This conditional probability can be bounded by 
means of the one-dimensional version of Proposition A, and the estimate we get in such 
a way is useful if the conditional variance of the random variable we have to handle has 
a good upper bound. Such a bound exists, and some calculation reduces the original 



3 = 1 



> x 1+a I with 



problem to the estimation of the probability P In 1 / 2 sup 

some new nice class of functions T' and number a > 0. This problem is very similar to 
the original one, but it is simpler, since the number x is replaced by a larger number 
x 1+a in it. By repeating this argument successively, in finitely many steps we get to an 
inequality that clearly holds. 

The above sketched argument suggests a backward induction procedure to prove 
Proposition 3'. To carry out such a program I shall prove a result formulated in Propo- 
sition 4. To do this first I introduce the following notion. 

Definition of good tail behaviour for a class of [/-statistics. Let us have some 
measurable space (X, X) and a probability measure fi on it. Let us consider some class 
T of functions f(x\, . . . ,Xk) on the k-fold product (X k , X k ) of the space (X, X). Fix 
some positive integer n and positive number a > 0, and take k independent copies 
£i )S , . . . , £ njS , 1 < s < k, of a sequence of independent \x- distributed random variables 
£i,...,£ n . Let us introduce with the help of these random variables the independent 
U -statistics I n ,k{f), f £ J~ ■ Given some real number T > we say that the set of 
independent U -statistics determined by the class of functions T has a good tail behaviour 
at level T if the inequality 

sup \n- k/2 I n , k (f) \ > An k/2 a k+1 ^ < exp ^-A 1/2k na 2 ^ for all A>T. (3.8) 

holds. 

Now I formulate Proposition 4. 

Proposition 4. Let us fix a positive integer n, real number a > and a probability 
measure fx on a measurable space (X,X) together with a countable L 2 -dense class T 
of canonical kernel functions f = f(xi, . . . , x^) (with respect to the measure fx) on the 
k-fold product space (X k , X ) which has exponent L and parameter D , and the number 
D satisfies condition (1-4) ■ Let us also assume that all functions f e T satisfy the 
conditions sup \f(xi, ■ ■ ■ , Xk)\ ^ 2~( k+1 > , J f 2 (xi, . . . , Xk)Li{ dx\) . . . fi( dxk) < 

Xj€X,l<j<k 

a 2 , and no 2 > K((L + (3) logn + 1) with a sufficiently large fixed number K = K(k). 
There exists some real number Aq = Ao(k) > 1 such that for all classes of functions T 
which satisfy the conditions of Proposition 4 the sets of U -statistics determined by the 
functions f G T have a good tail behaviour at level T for some T > A , provided that 
they have a good tail behaviour at level T 4 / 3 . 

It is not difficult to deduce Proposition 3' from Proposition 4. Indeed, let us observe 
that the set of [/-statistics determined by a class of functions T satisfying the conditions 
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of Proposition 4 has a good tail-behaviour at level n fc / 2 , since the probability at the left- 
hand side of (3.8) equals zero for u > n k l 2 . Then we get from Proposition 4 by induction 
with respect to the number j, that this set of [/-statistics has a good tail-behaviour also 
for T = n -( 4 /3) J 'fc/2 f or j = 1,2,... if n -(W fe /2 > A . This implies that if a class 
of functions T satisfies the conditions of Proposition 4, then the set of [/-statistics 
determined by this class of functions has a good tail-behaviour at level T = A^ S , i.e. at 
a level which depends only on the order k of the (independent) [/-statistics. This result 
implies Proposition 3', only we have to apply it not directly for the class of functions T 
appearing in Proposition 3', but these functions have to be multiplied by a sufficiently 
small positive number depending only on k. 

Thus to complete the proof of the Theorem it is enough to prove Proposition 4. I 
describe its proof in the special case k = 1 in the next section. This case is considered 
separately, because it may help to understand the ideas of the proof in the general case. 

The main difficulty in the proof of Proposition 4 is related to a symmetrization 
procedure which is an essential part of the proof. We want to apply some randomization 
with the help of a symmetrization argument, and this requires a special justification. 
This is not a difficult problem in the case k = 1, where it is enough to calculate the 
variance of a [/-statistic, but it becomes hard for k > 2. In this case we have to give a 
good estimate on certain conditional variances of some (independent) [/-statistics with 
respect to some appropriate conditions. To overcome this difficulty we formulate a result 
in Proposition 5 and prove Propositions 4 and 5 simultaneously. Their proof follows 
the following line. First Proposition 4 and Proposition 5 are proved for k = 1. Then, if 
Propositions 4 and 5 are already proven for all k' < k, then first we prove Proposition 4 
for k, and then Proposition 5 for the same k. Proposition 5 has a similar structure to 
Proposition 4. Before its formulation I introduce the following definition. 

Definition of good tail behaviour for a class of integrals of [/-statistics. Let 

us have a product space (X k x Y, X k x y) with some product measure p k x p, where 
(X k ,X k ,p k ) is the k-fold product of some probability space (X,X,p), and (Y,y,p) is 
some other probability space. Fix some positive integer n and positive number a > 0, 
and consider some class T of functions f(xi, . . . , x k , y) on the product space (X k x 
Y, X k x y, p k x p). Take k independent copies £i jS , . . . , £ njS , 1 < s < k, of a sequence 
of independent, p-distributed random variables £i,...,£ n . For all f e T and y EY 
let us define the independent U -statistics I n ,k(f, y) by means of these random variables 
£i,s) • • • > £n,s? 1 < s < k, and formula (3.6). Define with the help of these U -statistics 
Tn,k{f,y) the random integrals 

H n , k (f) = J l n , k (f'y) 2 p(dy), feF. (3.9) 

Choose some real number T > 0. We say that the set of random integrals H n ^{f), 
f G T , have a good tail behaviour at level T if 

P fsup n~ k H n , k (f)> A 2 n k a 2k+2 J < exp j-A^+^na 2 } for A>T. 
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Proposition 5. Fix some positive integer n and real number a > 0, and let us have a 
product space (X k x Y, X k x y) with some product measure \i k x p, where (X k , X k , / u fc ) 
is the k-fold product of some probability space (X, X , p), and (Y, y, p) is some another 
probability space. Let us have a countable L2-dense class T of canonical functions 
f(x±, . . . , Xk, y) on the product space (X h x Y, X k x y, pr x p) with some exponent L 
and parameter D which satisfies condition (1-4) ■ Let us also assume that the functions 
f G T satisfy the conditions 

sup |/(xi,...,x fc ,y)| <2-< fc+1 > 

Xj€X,l<j<k,yeY 

and 

J f 2 (x!,...,x k , y)p{ dxi) ...p( dx k )p{ dy) < a 2 for all f e T. 

Let the inequality no 2 > K((L + (3) logn + 1) hold with a sufficiently large fixed number 
K = K(k). 

There exists some number Aq = Ao(k) > 1 such that for all classes of functions 
T which satisfy the conditions of Proposition 5 the random integrals H n ^(f), f G T, 
defined in (3.9) have a good tail behaviour at level T , provided that they have a good tail 
behaviour at level T^ 2k+1 ^/ 2k . 

Similarly to the argument formulated after Proposition 4 an inductive procedure 
yields the following corollary of Proposition 5. 

Corollary of Proposition 5. If the class of functions T satisfies the conditions of 
Proposition 5, then there exists a constant Aq = Ao(k) > depending only on k such that 
the integrals H n ^{f) determined by the class of functions T have a good tail behaviour 
at level A . 

4. The proof of Proposition 4 in the case k = 1 

In this section Proposition 4 is proved in the special case k = 1. In this case we have to 
show that 

> An 1 ' V j < e - Al/2na2 if A > T (4.1) 

if we know the same estimate for A > T 4 / 3 and all classes of functions satisfying the 
conditions of Proposition 4. This statement will be proved by means of the following 
symmetrization argument. 

Lemma 3. Let the class of functions T satisfy the conditions of Proposition 4 for 
k = 1. Let 

£i,---,£n be a sequence of independent random variables, P{sj — 1) — 



7= sup 



3=1 
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Then 



-1) = \, independent also of the fx distributed random variables £i,...,£ r . 



sup 

n }eT 



£/&) 

< LP I -= sup 



> An^a 2 



i=i 



(4.2) 



> |nVV 2 ] if A>T. 



Proof of Lemma 3. Let us construct an independent copy £i, • • • , £ n °f the sequence 
£i, . . . , £ n in such a way that all three sequences £i, . . . , £ n > £i> • • • > £n an d £i> • • • > £n 



are independent. Define the random variables Z n (f) = Yl f(£j) an d Z n (f) 

n 

f(£j) for all / G JP. I claim that 



/i=i 



P ^sup \Z n (f)\ > A^a 2 ^ < 2P ^sup \Z n {f) - Z n {f)\ > j^v^j . (4.3) 

This relation follows from Lemma 2 (the symmetrization lemma) applied for the count- 
able sets Z n (f) and Z n (f), f G JP, with x = ^A^fna 2 and a = ^At/ho- 2 , since the 
fields Z n (f) and Z n (f) are independent, and P(\Z n (f) \ < a) > \ for all / G JP. Indeed, 
EZ n (f) 2 < o~ 2 , thus Chebishev's inequality implies that P(|Z n (/)| < v^cr) > 5 for all 
/ G JP. On the other hand, we have assumed that na 2 > K with some sufficiently large 
constant K > 0. Hence a < -j=^Jno- 2 , and v^a < a = ^Ay/na 2 if the constant if is 
chosen sufficiently large. 

Let us observe that the random field 



1 n 

Z„(/)-Z B (/) = -/&)), /G.F, 

7 = 1 



(4.4) 



and its randomization 



have the same distribution. Indeed, even the conditional distribution of (4.4') under 
the condition that the values of the Sj-s are prescribed agrees with the distribution 
of (4.4) for all possible values of the £j-s. This follows from the observation that the 
distribution of the field (4.4) does not change if we exchange the random variables £j and 
£j for certain indices j, and this corresponds to considering the conditional distribution 
of the field in (4.4') under the condition that Sj = —1 for these indices j, and Sj = 1 
for the remaining ones. 
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The above relation together with formula (4.3) imply that 



P I — = sup 



> An 1 / V 



1 



< 2P | — sup 



< 2P — sup 



1 



^Ll/2 2 



~ 3 



+ 2P -= sup 



E ".'M' 1 



= 4P 



sup 



3 = 1 



> ^n^a 2 



Lemma 3 is proved. 

To prove Proposition 4 for k = 1 let us investigate the conditional probability 



P(f,A\£ 1 ,...,Z n ) = P 



1 



E^/fe 



A ..... 

> — V Hi 



£l > • • • > Cn 



for all functions f E T , A>T and values (£i, . . . , £ n ). By Proposition A (with k = 1) 
we can write 

gA 2 no A 

36 



with 



P(.M|£i,...,£«)<C'ex Pl 

1 

5 2 (/,x 1 ,...,x n ) = -£/ 2 (x J ), f E J-. 



(4.5) 



Let us introduce the set 



# = if (A) = | . . . , x n ) : sup S 2 (/, xx, . . . , x n ) > (l + A 4 / 3 ) a 2 J 



I claim that 



P((^...^ n )EH)<e- A2/3 ™ 2 HA>T. 



(4.6) 



(4.6') 



To prove relation (4.6') let us consider the functions / = /(/) for all / E T defined 
by the formula f(x) = f 2 (x) — J f 2 (x)fi(dx) : and introduce the class of functions 



24 



F' = {/(/) : / e J 7 }. Let us show that the class of functions T' satisfies the conditions 
of Proposition 4, hence the estimate (4.1) holds for the class of functions T' if A > T 4 / 3 . 

The relation J f(x)p,(dx) = clearly holds. (In the case k = 1 this means that / 
is a canonical function.) The condition sup \f(x)\<^<j also holds if sup \f(x)\ < ^, 

and / p(x)n(dx) < J f 4 (x)fi(dx) <{J f 2 (x) /i(dx) < ^ < a 2 if / G T. It remained 
to show that T' is an L 2 -dense class with exponent L and parameter D. 

To show this observe that f(f(x) — g(x)) 2 p(dx) < 2 j(f 2 (x) — g 2 (x)) 2 p(dx) + 
2 f(f 2 (x) - g 2 (x)) 2 ^ dx) < 2(sup(|/(x)L+ \g{x)\) 2 (/(/(*) - ^(x)) 2 (p( <fc) + /*( <fc)) < 
J(f(x) — g(x)) 2 p(dx) for all f,g&J 7 ,f = /(/), g = #(<7) and probability measure p, 

where p = p ^ 2 At - > . This means that if {/i, . . . , / m } is an e-dense subset of T in the space 
Li2(X, X,p), then • • • , / m } is an £-dense subset of T' in the space L<i{X, X, p), and 
not only J 7 , but also T' is an L2-dense class with exponent L and parameter D. 

We get, by applying formula (4.1) for the number A 4 / 3 > T 4 / 3 and the class of 
functions T' that 

P((£i, ...,£„) e if) = P f sup ( I /&) + \ E £ / 2 fe)) ^ f 1 + A4/3 ) ^1 

sup iy/((,) >^ 4 /v/v 



< p 



.2 I <■ -A 2 / 3 n<r 2 



i.e. relation (4.6') holds. 

Formula (4.5) and the definition (4.6) of the set H yield the estimate 

P(f, A^, ... , £n) < Ce- B ^ /3 ^/ 40 if . . . , Cn) £ A" 



(4.7) 



for all / G and A > T for the conditional probability P(/, A\£±, . . . , £ n ). Let us 
introduce the conditional probability 



P(F,A\Z 1 ,...,Z n ) = P\ sup^= 



> — vn. 



no" 



for all . . . , £ n ) and A > T. We shall estimate this conditional probability with the 
help of relation (4.7) if . . . , £ n ) ^ P\ Given some set of n points (xi, . . . , x n ) in the 
space (X, A') let us introduce the measure v = v(xi, . . . , x n ) on (X, A") in such a way 
that v is concentrated in the points x\, . . . , x n , and ^({xj}) = ^. If J f 2 {u)u{ du) < 5 2 



for a function /, then £ jf( x j) — n 1 / 2 / |/(-u)|z/( <iw) < n 1 / 2 ^. Since we have 

assumed that na 2 > 1, this estimate implies that if / and g are two functions such that 
/(/ - g) 2 v{dx) < 5 2 with 6 = ±, then 



iV^a 2 . 



^ E Sjfixj) - ^ E Sjg(xj) 

3 = 1 3=1 



< < 
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Given some (random) point (£i,.--,£ n ) £ H let us consider the measure v = 
■ ■ ■ ■> £n) corresponding to it, and choose a 5-dense subset {/i, . . . , f m } of in the 
space L2(X,X,u) with 5 = ^ < 5 = whose cardinality m satisfies the inequality 
m < D5~ L . This is possible because of the L2-dense property of the class T . (This is 
the point where the L 2 -dense property of the class of functions T is exploited in its full 



strength.) The above facts imply that P{T, A\^\ 
these functions /i, . 



.,£„) < J2P(fl,A\^ 
1=1 

f m . Hence relation (4.7) yields that 



Cn) with 



P{T, Afo, ... , £n) < C J D(6n) L e-^ 2/3 ^ 2 / 40 if . . . , £„) £ # and A > T. 
This inequality together with Lemma 3 and estimate (4.6') imply that 



P I — L sup 



> in 1/2 ff 2 I < 4P [ sup 



(4.8) 



< 4CD(6n) L e- BA2/3n ° 2 / 40 + Ae'^™ 2 if A > T. 



Since we have a better power of A in the exponent at the right-hand side of formula 
(4.8) than we need, the relation no 2 > K((L + (3) logn + 1) holds, and we have the right 
to choose the constants K and A , A > A , sufficiently large, it is not difficult to deduce 
relation (3.8) from relation (4.8). Indeed, the expression in the exponent at the right- 
hand side of (4.8) satisfies the inequality f^A 2 / 3 na 2 > A 1 / 2 na 2 + K((L + j3) \ogn + 1) 
if A is sufficiently large, and 



1 



sup 

n f er 



£/&■) 



> An x l 2 a 2 



< 4C(6nf+ L e- K n- K ( L+ Ve- Al/2 ™ 2 +4e- A2/3 ™ 2 < e -^ 1/2 ^ 2 

if A > T, and the constants A and K are chosen sufficiently large. 
5. The symmetrization argument 

In the proof of Propositions 4 and 5 we need two symmetrization results for all k > 1 
which play the same role as Lemma 3 in the case k = 1. These results are described in 
Lemmas 4A and 4B. In this section these results are formulated and proved. The proofs 
go by induction with respect to k. During the proof of Propositions 4 and 5 for k we 
may assume that they hold for k! < k. 

Lemma 4A. Let T be a class of functions on the space (X k ,X k ) which satisfies the 
conditions of Proposition 4 with some probability measure fi. Let us have k independent 
copies £i jS , . . . ,£rz )S , 1 < s < k, of a sequence of independent fx distributed random 
variables £i, . . . , £ n , and a sequence of independent random variables e = (e\, . . . , s n ), 
P(e s = 1) = P(e2 = —1) = \, which is independent also of the random variables £j )S , 
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1 < j < n, 1 < s < k. Consider the independent U -statistics I n ,k{f), / G J~, defined 
from these random variables by formula (3.6) and their randomized version 

In,k(f) = l[ E e il "-e J - fc /(W,...,£j fc , fc ), / Gjr - ( 5 - 1 ) 

l<Js<^, s=l,...,k 
3s^j s > if s^s' 

There exists some constant A = A (k) such that the inequality 
P ( supn- fc / 2 |/ nfc (/)| > An k / 2 a k+1 ) < 2 k+1 P ( sup \l £ n k (f)\ > 2"^ +1 ) An k a k+1 

+ 2 k n k-l e -AW-» n o>/k 

(5.2) 

holds for all A > Aq . 

Before formulating Lemma 4B needed in the proof of Proposition 5 I introduce 
some notations. Some of them will be needed later. 

Let us consider a set of functions T of functions f(xi,...,Xk,y) G T on a space 
(X k x y, X k xy, /j h x p) which satisfies the conditions of Proposition 5. Let us choose 2k 

independent copies . . . , £^s, £1""^, . . . , ^i,/^, 1 < s < k, of a sequence of indepen- 
dent fx distributed random variables £i, • • • , £fc together with a sequence of independent 
random variables (ei, . . . ,e n ), P(e s = 1) = P(e s — 1) = |, 1 < s < n which are 
independent of them. For all subsets V C {1, . . . , k} of the set {1, . . . , k} let |V| de- 
note the cardinality of this set, and define for all functions f(xi, . . . , Xk, y) G T and 
V C {1, . . . , k} the independent [/-statistics 

l<j s <n, s=l,...,A; 

where 5 S = ±1, 1 < s < k, S s = 1 if s G V, and 5 S = — 1 if s £ V, together with the 
random variables 

Put 

In,My)=I { nr k} (f,y), H n , k (f) = H { n * r h} (f), (5.3") 

i.e. these random variables appear if V = {1, . . . , k} is taken in the previous definitions, 
and the random variables Q 1 ^, 1 < j < n, 1 < s < k are inserted in the formulas 
defining these random variables. 

Let us also define the 'randomized version' of the random variables T% k (f,y) and 

K,kU) as 

i { nfU,v) = ^ E ^■■■e J J^l...,^,y), fer, (5.4) 

l<j s <n, s=l,...,k 
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where 5 S = 1 if s G V, and 5 S = — 1 if s V, and 

H ( Zk\f) = J lZk\M 2 P(dy)' f e ^ (5-4') 



Let us also introduce the random variables 



W(f) = J 



-l 2 



£ (-i) m ^ e, (/,!/) 

vc{i,...M 



p{dy), feJ 7 . 



(5.5) 



Now I formulate the symmetrization result applied in the proof of Proposition 5. 

Lemma 4B. Let T be a set of functions on (X k x Y, X k x y) which satisfies the condi- 
tions of Proposition 5 with some probability measure /j, x p. Let us have 2k independent 
copies . . . , 1 < s < k, of a sequence of independent \i distributed random vari- 
ables £i,...,£ n together with a sequence of independent random variables ei,...,e n , 
P(e s = 1) = P(e s = —1) = \, 1 < s < n, which is independent also of the previously 
considered sequences. 

There exists some A = A (k) such that if the integrals H n ^(f), f e T , determined 
by this class of functions T have a good tail behaviour at level r pi 2k + l )/ 2k j or some 
T > Aq, (this property was defined at the end of Section 3), then the inequality 



A 2 



P sup H n , k (f) > A 2 n 2k a 2 ^ < 2P [ sup \W(f)\ > ^-n 2k a^ k+1 



(5.6) 



, 2 2k + 1 n k - 1 e - Al/2k ^ 2 : / k 



holds with the random variables H n ^(f) and W(f) defined in formulas (5.3") and (5.5) 
for all A>T. 

Let us observe that in the symmetrization argument of Lemma 4B we have applied 
the symmetrization 1^ (f,y) of 1^ k V(f,y), (compare formulas (5.3) and (5.4)), and 
compared the integral of the square of the random function I n ^{f,y) with the inte- 
gral of the square of a linear combination of the random functions I^ k ^\f,y). After 
this integration the effect of the 'randomizing factors' Sj will be weaker. Nevertheless, 
also such an estimate will be sufficient for us. But the effect of this symmetrization 
procedure has to be followed more carefully. Hence a corollary of Lemma 4B will be 
presented which can be better applied than the original lemma. We get it by rewrit- 
ing the random variable W(f) defined in (5.5) in another form with the help of some 
diagrams introduced below. 

Let Q = G{k) denote the set of all diagrams consisting of two rows such that both 
rows are the set {1, . . . , k} and the diagrams of Q contain some edges (/i, . . . , (l s , l' s ), 
< s < k connecting some points (vertices) of the first row with some point (vertex) 
of the second row. The vertices Zi, . . . , l s in the first row are all different, and the same 



28 



relation holds also for the vertices l[, . . . , l' s in the second row. For each diagram G G Q 
let us define e(G) = {(h, l[) . . . , (l s , l r s )}, the set of its edges, v±(G) = {l±, . . . , l s }, the 
set of its vertices in the first row and V2(G) = . . . , l' s }, the set of its vertices in the 
second row. 

Given some diagram G G Q and two sets V\, V 2 C {1, . . . , k}, we define with the help 
of the random variables . . . , $1, C~i , Q~n , 1 < s < k, and e = (ei, . . . , e n ) 
taking part in the definition of the expressions W(f), f G the random variables 
H n , k {f\G,V u V 2 ): 

H n , k (f\G 1 V ll V 2 ) = J2 II e ^ II £ J' S 

(ji,-,3k, fi,-,j'k) s^wi(G) s$v 2 {G) 

l<j' a <n,j' a &' a , Us^s', l<s,s'<k, (5.7) 
j s =j^ if (s,s')ee(G),j 3 ^j' 3 , if ( S)S ')ge(G) 

where 5 S = 1 if s G Vi, S s = -1 if s ^ Vi, and 5 S = 1 if s G V 2 , <5 S = -1 if s ^ V2. 
With the help of these random variables we can write that 

W(f)= £ (-l) |Vl|+|Va| ^n,*(/|G,^i,^) for all /G.F, 

Gea,Vi,v 2 c{i,...,fc} 

because 

/ ^' £) a y)Cfc' e) (/' ^ = E ^> F 2 ), for all Vi, y 2 C {1, . . . , k}. 

J GeG 

Since the number of terms in this sum is less than 2 4fc /c!, it implies that Lemma 4B 
has the following corollary: 

Corollary of Lemma 4B. Let a set of functions T satisfy the conditions of Proposi- 
tion 5. Then there exists some Aq = Ao(k) such that if the integrals H n ^(f), f G T , 
determined by this class of functions T have a good tail behaviour at level 7"( 2fc + 1 )/ 2fc 
for some T > A , then the inequality 

?fsup^(/)>A\W 



< 2 £ P ( sup \H n , k (f\G, V U V 2 ) | > n 2k a 2 ^ 



GeG, Vi,v 2 e{i,...,fc} 

+ 2 2k+l n k-l e -A^^n^/k (5 _ 8) 

holds with the random variables H njk (f) and H Ujk (f\G, Vi, V 2 ) defined in formulas (5.3") 
and (5.7) for all A>T. 
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The proof of Lemmas 4A and 4B uses the result of the following Lemma 5 which 
states that certain random vectors have the same distribution. 

Lemma 5. Let e = (si, . . . ,s n ) be a sequence of independent random variables, 
P(s s = 1) = P(s s = — 1) = \, 1 < s < n, which is independent also of 2k fixed inde- 
pendent copies . . . , £n/ s and , ■ ■ ■ , , 1 < s < k, of a sequence £i, . . . , £„ of 
independent \i distributed random variables. 

a ) Let T be a class of functions which satisfies the conditions of Proposition 4- With 
the help of the above random variables introduce the independent U -statistic 

= H E /«* (5-9) 

l<j' s <n, s=l,...,k 

for all sets V C {l,...,k} and functions f G T together with its 'randomized 
version' 

l<j' s <n, s—l,...,k 

where 5 S = ±1, l<s<k,d s = lifseV, and 5 S = — 1 if s ^ V . 
Then the sets of random variables 

s (f) = E ( 5 - 10 ) 

Vc{l,...,k} 

and sets of random variables 

s(f)= £ (-i) |v| ^ £) (/)' f ejr > ( 5 - 10 ') 

vc{i,...,fc} 
/lave £/ie same joint distribution. 

b ) Let T be the class of functions satisfying Proposition 5. For all functions f G T 
and V C {1, . . . , k} consider the independent U -statistics determined by the random 
variables ^j; 1 ], . . . , £n,l and ^[~^\ • • • , £n^s \ I < s < k by formula (5.3), and define 
with their help the random variables 



W(f) = J 



E (-i) m 4V/,y) 

VC{l,...,k} 



p(dy), feF. (5.11) 



Then the random vectors {W(f) : f G J 7 } defined in (5.11) and {W(f) : / G J 7 } defined 
in (5.5) have the same distribution. 
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Proof of Lemma 5. Let us consider Part a) of Lemma 5. I claim that for all M G 
{1, . . . , n} the conditional distribution of the random vector in (5.10') under the condi- 
tion that £j = 1 if j G M and Ej = —1 if Sj G {1, . . . , n}\M agrees with the distribution 
of the vector in (5.10). Since the distribution of the vector in (5.10) does not change 
if we exchange the random variables Q 1 ^ and £j~ s in it if j ^ M, 1 < s < k, and do 
not exchange them otherwise, it is enough to understand that the random vector we 
get from the vector in (5.10) after this transformation agrees with the random vector 
in (5.10') if we write Sj = 1 for j G M and Sj = — 1 for j ^ M in it. These random 
vectors really agree (not only in distribution) since for all functions / G T both vectors 

have a component which is the sum of terms of the form • • • , ^j^'k)' ^js = =tl? 

1 < s < k, multiplied with an appropriate power of (—1), and this power equals the 
number of —1 components in the sequence Sj t , . . . , 5j k plus the cardinality of the set 
{ji, . . . , jt} n M. Part b) of the lemma can be proved in the same way, hence it is 
omitted. 

Lemma 4A will be proved with the help of part a) of Lemma 5 and the following 
Lemma 6 A. 

Lemma 6A. Let us consider a class of functions T satisfying the conditions of Propo- 
sition 4, o,nd the random variables I^ k {f), f G T , V C {1, . . . , k}, defined in formula 

(5.1). Let B = B(£il, . . . ,£n,s', 1 < s < k) denote the a -algebra generated by the random 

variables . . . , ^n,l , 1 < s < k, taking part in the definition of the random variables 

k(f) • F° r a M V c {!>•••> k}> V 7^ {!> • • • ) k}> there exists a number A = A (k) > 
such that the inequality 

P (jup E {Z k (f) 2 \ B) > 2 -(3*+3),4W fe+2 j < n k-l e -A^ k ^naVk 

holds for all A > Aq. 

Proof of Lemma 6A. Let us first consider the case V = 0. Then E (l^ k {f) 2 £>j = 

E (jt,k(f) 2 ) < T}° 2 < n 2k a 2k+2 for all / G T. In the above calculation we exploited 
that the functions / G T are canonical, and this implies certain orthogonalities, and 
beside this the inequality no 2 > 1 holds. The above relation implies inequality (5.12) 
for V — for all u G fl if the number Aq is chosen sufficiently large. 

To avoid some complications in the notation let us restrict our attention to the sets 
V = {1, . . . , u} : 1 < u < k, and prove relation (5.12) for such sets. For this goal let us 
introduce the random variables 

fv (f ■ ST f(t^) A 1 ) 

1 n,k\J iJu+li ■ ■ ■ iJk) J - " " '^'u.M'^+i.w+l' - - - 'S'fc.fc / ' 

l<j s <n, s=l,...,u 
js^j s ' if s^s', l<s,s'<k 

for all / G J 7 , i.e. we fix some indices j u +i, . . . ,jk, 1 < js < n i u + 1 < s < k, j s ^ j s > 
if s 7^ s', and sum up only those terms in the sum defining 1^ k (f) which contain 
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£ - ^ „ , i P in their last k — u coordinates. Then we can write 



E(ll k (f) 2 \B) = E 



( 



\ 



( 



\ 



E 



In,k(fi ju+li ■ ■ ■ iju k ) 



l<j' a <n s=u-\-l,. . . ,k 



/ 



13 



E E(ll k (f,j u+1 ,...,j Uk f\B). 



(5.13) 



l<j s <n s=it+l,...,fc 

The last relation follows from the identity 

if {ju+l i ■ ■ ■ ? jk ) * {j'u +i> • • • > Jfe)) which relation holds, since / is a canonical function. 
It follows from relation (5.13) that 



u: sup E (lX k (f) 2 \ B) (u) > 2-( 3i + 3 )A\ 2t+2 



C 



{^2 n 2k a 2k+2 ~\ 
sup E (I% k (f,j u+1 , ...,j Uk ) 2 \B) (u) > \ . 



(5-14) 

The probability of the events in the union at the right-hand side of (5.14) can be 
estimated with the help of the corollary of Proposition 5 with parameter u < k instead 
of k. (We may assume that Proposition 5 holds for u < k.) This corollary yields that 



/ A2 n 2k+2 n k+u\ 

pUupE(ll k (fJ u+1 ,...J Uk f\B) > Aa 2(2k+ n 3) j< e -^+^-^ 



(5.15) 



Indeed, the expression E (j^ k (f ', j u +i, ■ ■ ■ iju k ) 2 B^j can be calculated in the following 
way: Take the independent [/-statistic 



k\ 



j 3 e{l,...,n}\{j u+1 ,...,j k }, 
s=l,...,u, 3s¥^3 s ' if s^s' 



(5.16) 

of order u with sample size n — k + u, and integrate the square of this function with 
respect to the variables x u +i, . . . , x k by the measure fi k ~ u . Hence the expression at 
the left-hand side of (5.15) can be bounded by means of Proposition 5 if we apply it 
for our class of functions T considering them as functions on (X u x Y, X u x y, p, u x p) 
with (y, y, p) = {X k ~ u , X k ~ u , n k ~ u ). (A small inaccuracy was committed in the above 
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statement because to define the expression in (5.16) as a £7-statistic we should have 
divided by it! instead of k\. But this causes no real problem.) 

,2 2fc + 2 k-\-u 

We get inequality (5.15) from Proposition 5 by replacing the level °2( 3k +™) — m 

the probability at the left-hand side by A 2 (n - u) 2u a 2u+2 < A * g^ffi"*"" • The last 
inequality really holds if the constant K is chosen sufficiently large in the condition 
na 2 > Klogn of Proposition 4. 

Relations (5.14) and (5.15) imply that 
P (jup E (Il k (f) 2 \ B) (u) > 2 -( 3k +^A 2 n 2k a 2k+2 j < n k-u e -A-^ + D (n _ u)<7 ^ 

and u < k — 1. Hence also inequality (5.12) holds. 

Now we prove Lemma 4 A. 
Proof of Lemma 4 A. We show with the help of Lemmas 2 and Lemma 6A that 

P { supn fc / 2 |/ n>fc (/)| > An k / 2 a k+1 ) < 2P (sup > ^n k a k+1 

\fer J \fer 2 J (5.17) 

+ 2 k n k - 1 e- Al/(2k - 1) ™ 2 / k 



with the function S(f) defined in (5.10). To prove relation (5.17) introduce the random 
variables Z(f) = and Z(f) = E (-1) |V|+1 ^*(/) for 

VC{l,...,fc}, Vjt{l,...,k} 

all /ef, the a-algebra £> considered in Lemma 6A and the set 

B = p| L: sup £ K fe (/) 2 | 5) (a;) < 2 ~^ k+ ^ A 2 n 2k a 2k+2 
vc{i,...,fc} I 

V/{l,...,fc} 



Observe that S(f) = Z(f) - Z(f), f G B G B, and by Lemma 6A the in- 
equality 1 — P(B) < 2 k n k ~ 1 e~ Ain2k 1)nfj2 / fc holds. Hence to prove relation (5.17) as a 
consequence of Lemma 2 it is enough to show that 

B^j (u) < ^ for all / G T if u G B. (5.18) 

But P (/„,*•(/) I > 2-( k+ VAn k a k+1 \F) (u) < 2"( fc+1 ) for all / G if u G P by the 
'conditional Chebishev inequality', hence relation (5.18) holds. 

Lemma 4A follows from relation (5.17), part a of Lemma 5 and the observation that 

the random vectors {I^k\f)}, f G J 7 , defined in (5.9') have the same distribution for 



\Z(f)\ > K k a k ^ 
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all V C {1, . . . , k} as the random vector 1^ k (f), f G T , considered in the formulation 
of Lemma 4A. Hence 

P [ sup \S(f) \ > -n k a k+1 ) < 2 k P I sup \l e n k (f)\ > 2"( fc+1 ) An k a k+1 

In the proof of Lemma 4B we apply following Lemma 6B which is a version of 
Lemma 6 A. 

Lemma 6B. Let us consider a class of functions T satisfying the conditions of Propo- 
sition 5 and the random variables k (f, y), f G T , V C {1, . . . , k}, defined in formula 

(5.3). Let B = B(^f],...,^l; 1 < s < k) denote the a-algebra generated by the 
random variables . . . ,£n)s, 1 < s < k, taking part in the definition of the random 
variables I% tk (f,y) and H^ k (f). 

a) For all V C {1, . . . , k}, V ^ {1, . . . , k}, there exists a number A = A (k) > such 
that the inequality 



P (^sup E(Hl k (f)\B) > 2 -( 4fc + 4 )A( 2fc - 1 )/ fc n 2fc a 2fc+2 j < n^ 1 



(5.19) 
holds for all A > Aq. 

b) Given two subsets V\,Vi C {1, . . . , k} of the set {1, . . . , k} define the random inte- 
grals 

H { ^ k V2 \f) = J \i:: k (f,y)i:%(f,y)\ P (dy), fer, 

with the help of the functions iXkifiV) defined in (5.3). If at least one of the sets 
Vi and Vi is not the set {1, . . . , k}, then there exists some number Aq = Ao(k) > 
such that if the integrals H njk (f), f G T , determined by this class of functions T 
have a good tail behaviour at level 7"( 2fc + 1 )/ 2fc f or some T > Aq, then the inequality 

P (^n V E{H [ ^ k V2 \f)\B) > 2 -( 2k +^A 2 n 2k a 2k +^j < Zn^e-^"™ / k . (5.20) 

holds for allA>T. 

Proof of Lemma 6B. Part a) of Lemma 6B can be proved in the same way as Lemma 6A, 
only the formulas applied in the proof become a little bit more complicated. Hence I 
omit the proof. (The difference between the power of the parameter A at the right-hand 
side of formulas (5.19) and (5.12) appear, since the left-hand side of (5.19) contains the 
term A( 2 k-i)/2k anc i no t A 2 .) Part b) will be proved with the help of Part a) and the 
inequality 

/ \ 1/2 / x 1/2 

sup E(H^\f)\B) < snp E(H^ k (f)\B) sup E{H% k {f)\B) 
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which follows from the Schwarz inequality applied for integrals with respect to condi- 
tional distributions. Let us assume that V\ ^ {l,...,fc}. The last inequality implies 
that 

P ( sup E(H^' Va \f)\B) > 2 -( 2k +^A 2 n 2k a 2k+2 

<P[ sup E(H^ k (f)\B) > 2-^ k +^A^ 2k - l ^ k n 2k a 2k+2 

+ p( sup E{Hl\{f)\B) > A ^ 2k+1 ^ k n 2k a 2k+2 
Hence the estimate (5.19) for V = V\ together with the inequality 

P (sup E(H^ k (f)\B) > A ( 2k +V/ k n 2k * 2k+2 \ < n k-i e -A^ k n**/k 

which follows from Part a) if V2 7^ {1, . . . , n} (in this case the level A^ 2k+l ^ k n 2k a 2k+2 
can be replaced by 2~( 4k+4: ) A( 2k ~ 1 )/ k n 2k a 2k+2 in the probability we consider) and from 
the conditions of Part b) if V 2 = {1, . . . , k} imply relation (5.20). 

Now I prove Lemma 4B. 

Proof of Lemma J±B. By Part b) of Lemma 5 it is enough to prove that relation (5.6) holds 
if the random variables W(f) are replaced in it by the random variables W(f) defined 
in formula (5.11). We shall prove this by applying Lemma 2 with the choice of Z(f) = 

H { J/\f), v = {1, . . . ,k}, 2(f) = w(f) - z(f), fer,B = B(ti]l • • • , e2; i < * < 

k), and the set 

B= f| L: sup E{H^ V2 \f)\B){u) < 2 -( 2k +V A 2 n 2k a 2k+2 \ . 

(Vi,V 2 ): V,-C{1,...,*}, j=l,2 { fe:F J 
Vi/{1,...,*} or V 2 /{l,...,fc} 

By Lemma 6B 1 - P(B)) < 2 2k+1 n k - 1 e- Al/2kna2 / k , and to prove Lemma 4B with 
the help of Lemma 2 it is enough to show that 



p(\Z(f)\ > ^n 2k a 2 ^ 



B)(u>)<^ for all / G T if u G B. 



To prove this relation observe that 

E(H {v }' V2) (f)\B).< ■ 



E(\Z(f)\\B)< J2 E(H^\f)\B),<^n 2k a 2k+2 if u G B 



(Vi,V 2 ): ^C{l,...,fe}, j=l,2 
V^{l,...,k} or V 2 /{l,...,fc} 

for all / G T. Hence the 'conditional Markov inequality' implies that 



P [ \z(f)\>^n 2k a 2k+2 



) < ^ \iueB and / G T. 



Lemma 4B is proved. 
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The proof of Propositions 4 and 5 

The proof of Propositions 4 and 5 for general k > 1 with the help of the symmetrization 
lemmas 4A and 4B is similar to the proof of Proposition 4 in the case k = 1 presented 
in Section 4. The proof applies an induction procedure with respect to the parameter k. 
In the proof of Proposition 4 for parameter k we may assume that Propositions 4 and 5 
hold for k' < k. In the proof of Proposition 5 we may also assume that Proposition 4 
holds for the parameter k. 

In the proof of Proposition 4 let us introduce (with the notation of this proposition) 
the functions 

Sl, k (f)(xj,s, 1 <j <n, 1 < s < k) = ^ f( x 3i,i,---, x j k ,k), feJ 7 , 

l<j s <n, s=l,...,k 

(6-1) 

where Xj tS e X, 1 < j < n, 1 < s < k. Fix some number A > T and define the set H 

H = H(A) = l(x jiS , 1 < j < n, 1 < s < k), 

1 1 (6-2) 

sup Sl k (f)( Xj , s , 1 < j < n, 1 < a < k) > 2 k A^n k o 2 . 

We want to show that 

P{{u: (t J)S {u),l<j <n,l<s<k)eH})<2 k e- AVakn(j2 if A > T. (6.3) 

Relation (6.3) will be proved by means of the Hoeffding decomposition of the U- 
statistics with kernel functions f 2 (xi, . . . , xu), f E J 7 , and by the estimation of the sum 
this decomposition yields. More explicitly, write 

f 2 ( Xl ,...,x k )= fvixjJeV) (6.4) 

Vc{l,...,k} 

with fv{xj,j G V) = \\ Pj I~J Qjf 2 (xi 7 . . . , Xfc), where Pj and Qj are the operators 

jc/\' jev 

P^ and defined in formulas (2.6) and (2.6") if {Y\ x Z x Y 2 ,yi x Z x Y 2 ) is the 
/c-fold product {X k , X k ) of the measure space (X, X) in these definitions, Z is the j-th 
component in these products, and Y\ is the product of the components before and Y 2 is 

the product of the components after this component. (Relation (6.4) follows from the 

k 

identity f 2 = Yl (Pj + Qj)f 2 ^ tne multiplications are carried out in this formula. In 

3 = 1 

the calculation we exploit that the operators Pj and Pj' are commutative if j ^ j', and 
the same relation holds for the pairs Pj and Qj> or Qj and Pj' or Qj and Qj'.) 

The identity S 2 k (f)(^, r 1 < j < n, 1 < r < k) = k\I n , k (f 2 ) holds for all / e 
and by writing the Hoeffding decomposition (6.4) for each term / 2 (^ 1; i . . . , £,j k ,k) in the 
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expression I n ,k{f 2 ) we get that 



P sup 1 < j < n, 1 < a < fc) > 2 fc A 4 / 3 n fc a 2 

J*? 



(6.5) 

< £ P ( supn*-^l|I n>|v| (/ v )| > A 4 /3 n V 
vc{i,...,fc} V /e ^" 

with the functions fv in (6.4). We want to give a good estimate for all terms in the 
sum at the right-hand side in (6.5). For this goal we show that the classes of functions 
{fv '■ f £ J 7 } satisfy the conditions of Proposition 4 for all V C {1, . . . , k}. 

The functions fv are canonical for all V C {1, . . . ,k}. (This follows from the 
commutativity relations between the operators Pj and Qj mentioned before, the identity 
PjQj = and the fact that the canonical property of the function can be expressed in 
the form Pjfv = for all j E V '.) We have \f 2 (xi, . . .,x k )\ < 2~ 2( - k+1 \ The norm of 
Qj as a map from the space to space is less than 2, the norm of Pj is less than 



1, hence 



sup f v (xj,jeV) 
xjexjev 



< 2 -(*+2) < 2"( fc +!) for all V C {1, ... , k}. We have 



f f 4 (x u ...,x k )p(dx 1 )...p(dx k ) < 2-( fc+1 V, hence / f v (xjj e V) II M^i) < 

2-( fc + 1 ) C r 2 < a 2 for all V C {1, . . . , k} by Lemma 1. Finally, to check that the class of 
functions Tv = {fv'- f £ J 7 } is L2-dense with exponent L and parameter D observe 
that for all probability measures p on (X k , X k ) and pairs of functions f,gET f (f 2 — 
g 2 ) 2 dp < 2~ 2k f (/ -g) 2 dp. This implies that if {/i, . . . , / m }, m < De~ L , is an e-dense 
subset of in the space L 2 (X k , p), then the set of functions {2 k f 2 , . . . , 2 fc /m} is an 
£-dense subset of the class of functions T' = {2 k f 2 : f € JF}. Then by Lemma 1 for all 
F C {1, . . . , k} the set of functions {(fi)v, ■ ■ • , (fm)v) is an £-dense subset of the class 
of functions Tv in the space L2(X k , X k , p). This means that .TV is also L2-dense with 
exponent L and parameter D. 

For F = the relation fv = J f 2 (xi, ■ ■ ■ , x k )fj,( dx\) . . . p{ dx k ) < cr 2 holds, and 
^|V|(/|V|)| = fv < cr 2 . Therefore the term corresponding to V = in the sum at the 
right-hand side of (6.5) equals zero if A > 1 in the conditions of Proposition 4. The 
terms corresponding to sets V, 1 < \V\ < k in these sums satisfy the inequality 

P (sup |J n> | V | {f v )\>AW n Wo* 

< P (jup \T n , ]v \(fv)\ > A 4 / 3 n^a^+^j < e -^ 2/3fe - 2 if 1 < \V\ < k. 

This inequality follows from the inductive hypothesis if |V| < k, and in the case 
V = {l,...,fc} from the inequality A > T and the assumption that [/-statistics de- 
termined by a class of functions satisfying the conditions of Proposition 4 have a good 
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tail behaviour at level T 4 / 3 . The last relation together with formula (6.5) imply rela- 
tion (6.3). 

By conditioning the probability with respect to 

the random variables £ J)S , 1 < j < n, 1 < s < k we get with the help of Proposition A 
that 



P ( \l £ n , k U)\ > 2-^An k a k+1 \^) = 1 < j < n, 1 < a < fc) 

A 2 n 2k a 2(k+1) 



1/k' 



< Cexp < -B 



2 2fc+4 e fc (^, s ,l < 3 <n,l < a < fc) 



(6.6) 



< Ce" 2 3 4/feBA2/3 " nff2 for all / e T if s , 1 < j < n, 1 < s < k} (£ H. 

Given some points Xj iS , 1 < j < n, 1 < s < k, define the probability measures p s , 
1 < s < k, uniformly distributed on the set x J;S , 1 < j < s, i.e. p s (xj :S ) = -, 
1 < j < 77., and their product p = p\ x • • • x p k . If / is a function on (X k ,X ) 
such that J f 2 dp < 5 2 with some 5 > 0, then |/(x,>)| < 5n fc / 2 for all 1 < s < k, 



1 < j < n, and P ( J* ifc (/) > 5n 3/£ / 2 f i>a = x i>fl , 1 < j < n, 1 < s < kj =0. Choose 

the numbers 5 = An~ k / 2 2~( k+2 ^ a k+1 and 5 = 2~ ( - k+2 ^n~ k ~ 1 / 2 < 5. (The inequality 
5 < 5 holds, since A > Aq > 1, and a > n -1 / 2 .) Choose a 5-dense set {/i, . . . , f m } in 
the L 2 (X k , X k , p) space with m < D5~ L < 2( fc+2 ) L n /3+(fc+1/2)L elements. Then formula 
(6.6) implies that 

P (jup |J- >fc (/)| > 2-^An k a k+1 \ £ j>B (u) = x j>s , 1 < j < n, 1 < a < k^j 

m 

< E P ( I J n,*(/i) I > 2" (fc+2) An fc a fc+1 fl ( w ) = 1 < j < n, 1 < a < fc) 

j=i (6.7) 

< C2( fc + 2 ) L ^+( fc+1 / 2 ) i e- 2 " 3 " 4/fe ^ 2/3fe ^ 2 if {x,- s , 1 < j < n, 1 < a < k} { H. 



Relations (6.3) and (6.7) imply that 
^^up.|/; fc (/)|>2-( fc+1 )AnV +1 j 

< C2 (i+2)L n /3+(Hl/2)L e -2- 3 - 4 / ,! BA 2 / 3k Ilff 2 + 2 fc e -A 2 / 3fe n<7 2 j f A > 



(6.8) 



Proposition 4 follows from the estimates (5.2) and (6.8) if the constants A and K in 
the condition no 2 > K((L + (3) logn + 1) are chosen sufficiently large. In this case the 
upper bound these estimate yields for the probability at the left-hand side of (3.8) is 
smaller than e~ A ^ feno " 2 . 
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Let us turn to the proof of Proposition 5. By formula (5.8) it is enough to show 



that 



P[ sup\H n , k (f\G,V 1 ,V 2 )\> 



A 2 



2 4k + 1 kl 



n 



2fc a 2(fc+l) j < e -A 1 / 2fe n f r 2 



(6.9) 



for all GeG and V U V 2 G {!,...,&} if A > A . 



with the random variables H n ^(f\G, Vi, V 2 ) defined in formula (5.7). Let us first prove 
(6.9) in the case when |e(G)| = k, i.e. all vertices of the diagram G are an end-point 
of some edge, and the expression H n ^{f\G, Vi, V2) contains no 'symmetryzing term' Sj. 
By the Schwarz inequality 



/ 



\H n , k {f\G,V u V 2 )\ < 



\ 



1/2 



. jl,---,jkA<3s<n 



( 



\ 



1/2 



(6.10) 



E 



. ji,-,jk>l<i'»<n, 



/ 2 (eS,---,Q5,y)p(^) 



(4) 



/ 



for such diagrams G, where 5 S = 1 if s G Vi, 5 S = — 1 if s Vi, and 5 S = 1 if s G V 2 , 
5 S = — 1 if s V2. Hence 



ji,---,jfe,i<j s <", 



^M,y) P (^)> 



^2 rl 2fc cr 2(fc+l) 



U a,: sup E / 2 «]:>),-.4>), 9 W«)>^4; 



ji,...,j fe ,l<j s <n 
j s /j' s ' if s^s' 

The last relation implies that 

pU P \H n Mf\G,V 1 ,V2)\> ¥ ^^ 



n 2 fc(T 2( fc +l) 



< 2P 



/ 

sup fe /(^i,i»---»^ fc ,fc) > 

1 f E:F jl,---,3kA<js<n, 
\ js7^3 a ' if s#s' 



A 2 n 2fc a 2(/c+l) 



(6.11) 
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with hf(xi,...,x k ) = J f 2 ( 

X\, . . . , Xk, y)p(dy), f G T. (In this upper bound we could 
get rid of the terms Sj and Sj, i.e. on the dependence of the expression H n ^{f\G, Vi, V2) 
on the sets V\ and V2, since the probability of the events in the previous formula do not 
depend on these terms.) 

I claim that 

P [ sup \I n k (hf)\ > An k a 2 ) < 2 k ^-^ 1/2k ^ 2 for A> A (6.12) 
\fer ' J 

if the constant Aq and K are chosen sufficiently large in Proposition 5. Relation (6.12) 

together with the relation n 2 4fc+i fc! > n a 2 imply that the probability at the right-hand 

side of (6.11) can be bounded by 2 k+1 e~ A 1 na , and the estimate (6.9) holds in the 
case |e(G)| = k. Relation (6.12) can be proved similarly to formula (6.3) in the proof of 
Proposition 4. It is not difficult to check that < f hf(xi, . . . , dx\) . . . dxk) < 

cr 2 , sup \hf(xi, . . . ,Xk)\ < 2 _2 ( fc+1 ), and the class of functions H = {2 h hf, f G J 7 } is 
an L2-dense class with exponent L and parameter D. This means that by applying the 
Hoeffding decomposition of the functions hf, f e similarly to formula (6.4) we get 
such sets of functions (hf)v, f G JFfor all V C {1, . . . , k} which satisfy the conditions 
of Proposition 4. Hence a natural adaptation of the estimate given for the expression 
at the right-hand side of (6.5) yields the proof of formula (6.12). Let us observe that 
by our inductive hypothesis the result of Proposition 4 holds also for k, and this allows 
us to carry out the estimates we need also for the class of functions (hf)y, f G J 7 , with 
V = {l,...,k} if A> Aq. 

In the case e(G) < k formula (6.9) will be proved with the help of Proposition A. 
To carry out this proof first an appropriate expression will be introduced and bounded 
for all sets Vi, V2 C {1, . . . , k} and diagrams G EQ such that \e{G)\ < k. To define the 
expression we shall bound first some notations will be introduced. 

Let us consider the set Jq{G) = Jq(G, k, n), 

J o( G ) = • • -Jkj'n ■■■J'k) : 1 < Jsj's <n, I < s,s' < k, j s ^j s > if s ^ s', 
j' s ? j' s , if s ? s\ j s = j' s , if (a, s') G Gj s ? j' s , if (a, s') $ G}, 

the set of those sequences (ji, . . . ,jk,j[, • ■ ■ > j'k) which appear as indices in the summa- 
tion in formula (5.7). I give a partition of Jq{G) appropriate for our purposes. 

For this aim let us first define the sets M\ = M\(G) = {s(l), . . . , s(k — \e(G)\)} = 
{l,...,k}\ s(l) < ••• < s(k - \e(G)\), and M 2 = M 2 (G) = {s(l), . . . , s(k - 

\e(G\)} = {1, . . . , k} \ V2(G), s(l) < • ■ ■ < s(k — \e(G\), the sets of those vertices of the 
first and second row of the diagram G in increasing order from which no edges start. 
Let us also introduce the set V(G) = V{G, n, k), 

V ( G ) = {(js(l),---,js(k-\e(G)\),j's(l)T--J's(k-\e(G)\)) : 1 ^ js(p)j's(p) < U i 

l<p<k - \e(G)\, j s{p) ^ 3s( P ')i j's(p) ^j's( P ') if <P,p' <k - \e(G)\}, 
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which is the set consisting from the restriction of the coordinates of the vectors 

(ji,---,,jk,j[,---,j'k) e J (G) 

to Mi U M 2 . Given a vector v G V[G) let u(r), 1 < r < k - \e(G)\, and v(r), 1 < r < 
fc— |e(G)|, denote its coordinates corresponding to the set Mi and M2 respectively. Put 

Eg(v) = {(j'i,---,ifc,j'i,---,ifc) : 1 <Js < n, if s G 1 < j' a < n if s G 172(G), 

j s ^ is ^ is' if s ^ is = is' if ( s > s ') e G and is ^ iy if (s, «') £ G 

i s( r) = u(r), i s V) = «(r), 1 < r < fc - |e(G)|}, v G V(G), 

where {s(l), . . . , s(k - |e(G)|)} = M 1 , {s(l), . . . , s(k - |e(G)|)} = M 2 in the last line 
of this definition. The set Eq(v) contains those vectors in Jq(G) whose coordinates 
in Mi U Mi are prescribed by the vector v G V(G) and the remaining coordinates are 
chosen freely. 

Now we define the partition 

J (G)= |J E G (v). 

vev(G) 

of the set Jq{G). 
The inequality 

P (s(F\G, Vi, V 2 )) > A 8 / W) < 2 fc + 1 e - /l2/3fe ^ 2 if A > A and e(G) < k (6.13) 
will be proved for the random variable 

s{mvuv 2 ) = sup Yl ( E ff(€ 1 l---'$ k %y) 

/(^,...,^,,)P(^)) 2 , 



(6.13') 



where 5 S = 1 if s G Vi, 5 S = -1 if s Vi, and 5 S = 1 if s G V 2 , 5 S = -1 if s V 2 . 

To prove formula (6.13) let us first fix some v G V(G) and apply the Schwarz 
inequality. It yields that 



2 



v 0'i, — >jfc J'i, — .Jfc)€S G (t;) 



< 



K (ji,---,jk,j' 1 ,---,j' k )eE G (v) 

E / 2 (e|; 1 , ,...,C^)"(*) 

(ii,---,ifc,ii,---,ifc)eE G (v) 
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for all v e V(G). Summing up these inequalities for all v £ V(G) we get that 
S(F\G, Vi, V2) < sup £ ( £ J . . . , ££U)p( dy) 



E / 2 (ef; 1 ) ,-..,eg,y)p(dy) (6.14) 

^(jiv,jfc,ji,---,jfc)es G (v) / 



(3i, — ,3k,ji,-;j' k )&Jo(G) 

E 



sup 



e2( C (Si) c (S k ) 

^(ji,---,jk,j' 1 ,---,j' k )eJ (G) 



To check formula (6.14) we have to observe that by multiplying the inner sum at the left- 
hand side of this inequality each term • • • » 2/)/ 2 (fj-£i» • • • , Q?**, 2/) appears 
only once. (In particular, it is determined which index v G V(G) has to be taken in the 
outer sum to get this term. The coordinates of this vector v agree with the coordinates 
of the vector j = (ji, . . . , j k , j[, ■ . . , j' k ) in Mi U M 2 , with the coordinates of the vector j 
which correspond to those vertices from which no edges of the diagram G start.) Beside 
this, all these products appear if the multiplications at the right-hand expression are 
carried out. 

Relation (6.14) implies that 

P(S(F\G, V u V 2 )) > A 8 / 3 n 2k a 4 ) < 2P I sup I n k (h f ) > A 4 / 3 n k a< 

with hf(xi, . . . , Xk) = f f 2 (%i, ■ ■ • , Xk-, y)p( dy). (Here we exploited that in the last 
formula S(J-\G, V\, V2) is bounded by the product of two random variables whose dis- 
tributions do not depend on the sets V\ and V2.) Thus to prove inequality (6.13) it is 
enough to show that 

2P ^sup I n , k (h f ) > A 4 / 3 n k a 2 j < 2 fc + 1 e - A2/3fc if A > A . (6.15) 

Actually formula (6.15) has been already proved, only formula (6.12) has to be applied, 
and the parameter A has to be replaced by A 4 / 3 in it. 

The proof of Proposition 5 can be completed similarly to Proposition 4. It follows 
from Proposition A that 



P (\H n , k (f\G, Vl ,V2)\ > ^M^ V(fc+1) 



Zil,l<3<n,l<s<k)(u;) 



"3,s 



< Ce - B2 - (4+2/k) W- 1/kAVSk ™* if S(f\G, V U V 2 ))(u>) < A^n 2k a A 

for all / e G e Q, \e{G)\ < k, and V u V 2 G {1, . . . , k} if A>A . 

(6.16) 
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Indeed, in this case the conditional probability considered in (6.16) can be bounded by 

I / 4 4 4fc 4(fe + l) A 1 / 2 '' ] f / 4 4 /3 2k 4fc\ 1 / 2 i| 

Cexp |-S ( k 28fc+ 4( fc ?; 2 g'( J r| G)Vl|Va) J j < Cexp j-B ) j, where 2j = 

2k — 2|e(G)|, the number of vertices of the diagram G from which no edges start. Since 
j < k, no 2 > 1, and also 2 & ^+ A {k\) 2 — 1 ^ A) is chosen sufficiently large the above 
calculation implies relation (6.16). 

Let us show that also the inequality 



p ( sup \H nik {f\G,v u v 2 )\ > ^T^ 2fe ^ (fe+1) 



tf,l,l<j<n,l<s<k)(u) 



< Cn (3k+l)L/2+P e -BA^n**/2^Hkl)^ if y^ y^ < 

for all G G G, \e(G)\ < 1, and V u V 2 e{l,...,k} if A > A 

(6.17) 

holds. 

To deduce formula (6.17) let us fix an elementary event 10 G O which satisfies 
the relation S(J r \G 7 V u V 2 ))(uj) < A 8 / 3 n 2k a 4 , two sets V U V 2 C a diagram 

G, consider the points Xj S = Xj s (u>) = Qg(u>), 1 < j < n, 1 < s < k, and 
introduce with their help the following probability measures: For all 1 < s < k define 
the probability measures Vg which are uniformly distributed on the points Xj*\ 1 < 

j < n, and which are uniformly distributed on the points Xj S *\ 1 < j • < n, 
where 5 S = 1 if s G V"i, S s = — 1 if s £ V"i, and similarly 5 S = 1 if s G V 2 and 
5 S = — 1 if s V 2 . Let us consider the product measures a± = x • • • x x p, 

«2 = v[ 2 ^ x • • • x x p on the product space (X fc x Y, A'' x y), where p is that 
probability measure on (Y, y) which appears in Proposition 5, together with the measure 
a = ai + a2 . Given two functions / G T and g G we give an upper bound for 
Itfn.fcC/IG, V u V 2 ){u)-H n , k (g\G, V u V 2 )(u)\ if J(f-g) 2 da < 5 with some 5 > 0. (This 
bound does not depend on the 'randomizing terms' Sj{u>) in the definition of the random 
variable H n , k (-\G, V±, V 2 ).) 

In this case J(f — g) 2 dctj < 28 2 , and 

J \f(x^l . . . , xftly) - g(x<*j> t • • • , x k %y)fp( dy) < 25 2 n\ 
J \f(x{% . . . , xftlv) - g(x[% . . . , xi%y)\p( dy) < V25n k / 2 

for all 1 < s < k, and 1 < j s < n, and the same result holds if all 5 S is replaced by S s , 
1 < s < k. Since |/| < 1 for / G J 7 , the condition /(/ — #) 2 da < S 2 implies that 
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for all vectors (ji, . . . , j k , j[, ■ ■ ■ , j' k ) which appear as an index in the summation in (5.7), 
and 

\H n , k (f\G, V 1 , V 2 )(u) - H n , k {g\G, V u V 2 )(u)\ < 2v / 25n 5fc / 2 
if the originally fixed uj G O is considered. 

Put 5 = ^^(ik+T/llll^ i an d <5 = n~( 3h+1 ' ) / 2 < 5 (since a > and we may 
assume that A > A is sufficiently large), choose a 5-dense subset {/i, . . . , f m } in the 
L 2 (X k x Y, X k x Y, a) space with m < D5~ L < n (3fc+i)£/2+/3 e i emen t s . Relation (6.16) 
for these functions together with the above estimates yield formula (6.17). 

It follows from relations (6.13) and (6.17) that 

P L^\H n M\G^V 2 )\ > ^y^ 2(fc+1) ) <2 k ^e- AV3k ™ 2 

+ Cn (3k+l)L/2+f3 e -BA^ k n^/2^/ k \k^/ k if A > A 

for all Vi,V 2 C {1, . . . , k} also in the case |e(G)| < k — 1. This means that relation 
(6.9) holds also in this case if the constants Aq and K are chosen sufficiently large in 
Proposition 5. Proposition 5 is proved. 

Appendix. The proof of Proposition A 

The proof will be based on the hypercontractive inequality for Rademacher functions. 
Let me first recall this result. 

The hypercontractive inequality for Rademacher functions. Let us consider the 
measure spaces (X, X , fi) and (Y, y, v) = (X, X , fx) defined as X = {—1,1}, X contains 
all subsets of X , and A*({1}) = fx({ — l}) = \. Given a real number 7 > let us define 
the linear operator T 7 which maps the real (or complex) valued functions on the space 
X to the real ( or complex) valued functions on the space Y , and satisfies the relations 
T 7 r = ro, and T 7 ri = 77-1, where ?~o(l) = vq(— 1) = 1, and ri(l) = 1, r\{— 1) = 1. For 
all n = 1, 2, . . . let us consider the n-fold product (X n , X n , fx n ) and (Y n , y n , u n ) together 
with the n-fold product of the operator T™ (i.e. T™ is the linear transformation for 
which T™{fi{x{) ■ ■■f n {xn)) = (T 7 /i(xi) • • ■T 1 f n {x n )) for all functions f s , 1 < s < n, 
on the space (X,X,fx)). For all n = 1,2,... the transformation T™ from the space 
L p (X n , X n , fx n ) to the space L q (Y n ,y n , z/ n ) has the norm 1 if 1 < p < q < 00, and 

The following corollary of the hypercontractive inequality is useful for us. 

Corollary of the hypercontractive inequality. Let e\,...,e n be independent iden- 
tically distributed random variables P(ej = 1) = P(ej = —1) = \, 1 < J ' < n, fix some 
real numbers a(ji, . . . ,j k ) for all indices (ji, . . . ,j k ) such that 1 < j s < n, 1 < s < k, 
and j s 7^ j s ' if s 7^ s' , and define the random variable 

Z= Yl a (h,---,jk)e jl ---e jk . 

l<j s <n, l<s<fc 
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The inequality 

kq/2 



^— j-J (E\Z\ p ) q/p if Kp<g<oo 



holds. 

Proof of the Corollary. Let us define the function 

f(x l ,...,x n ) = a tii,---Jk)ri(x J1 )---r 1 (x Jk ) 

l<j s <n, l<s<k 

on the space (X n , A 1 ", fi n ). Observe that T"/ = 7 fc / for this function / and all 7 > 0, 
and E\Z\ P = \\f\\p, E\Z\ q = \\f\\q. Fix some numbers 1 < p < q < 00, and put 

7 = ^/fEr- Since the norm of T™ as a transformation from the space L p (X n , X n , fi n ) 

to the space L q (Y n , y n , u n ) is 1, \\T%f\\ q = ~f k \\f\\ q < ||/|| p . The above relations imply 

that (E\Z\ q ) 1 / q < ( E\Zf) x l p in this case, and this is what we had to show. 

Applying the corollary with p = 2 and some q > 2 we get that 

E\Z\* <(q- l) fc s/2 (EZ 2 ) q/2 < q kq l 2 (EZ 2 ) q/2 = q kq / 2 S q 



with 

S 2 = 



Y [ Y a ((A(i)'---'A(fe))j > 

■j2---<jk<n \7ren fe / 



l<ji<j2---<Jfc<n \7re1ife 

where 11^ denotes the set of all permutations of the set {1, . . . , k}. Observe that 

I Y a (^(i)' • • • > jw(fc)) I ^ /c! Y fl2 (^(i)' • • • > jTr(fc)) for a11 1 < h < • • • 3k < n, 

Vren fe / 7r6n fe 

hence S 2 < k\S 2 , and E\Z\* < q kq / 2 (kl) q / 2 S q with the number S 2 defined in (3.4). 
Thus the Markov inequality implies that 

P(\Z\ > x) < (v k,2 ^f- j forallx>0 and q > 2. 

Choose the number q as the solution of the equation q ( x J = -. Then we get that 

P(\Z\ >x)< exp [-B (f ) 2A } with B = ^ttf, provided that q = (§ ) 2/k > 

2, i.e. B (§) 2 ^ > k. By multiplying the above upper bound with C = e k we get such 
an estimate for P(\Z\ > x) which holds for all x > 0. 
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